Metabolic data of endosymbiontic, parasitic and free bacteria
The SBML format generated by SymbioCyc has been inspired by the SBML format used at the Systems Biology Research Group of the UCSD .
Visit the SBML Site to see the global description of SBML files.
  1. The identifiers of the compounds (species) and the reactions from the pathway tools are transformed to be compatible with the SBML format in the same way that the LISP function "biocyc2sbml.lisp" developed by Jeremy Zucker.

    • SBML unique identifiers may only contain numbers, letters or underscores. Furthermore, the first character cannot be a number. According to the BNF grammar on page 7 of the level 2 SBML spec:

      letter ::= 'a'..'z', 'A'..'Z'
      digit ::= '0'..'9' nameChar ::= letter | digit | '_' name ::= ( letter | '_' ) nameChar*

      This is in contrast to Biocyc unique identifiers which may contain parentheses, dashes, or html markup.

      In order to ensure that no information is lost when converting a Biocyc id to an SID, the following algorithm is employed:

      1. If the first character is not a letter, prepend a single underscore. i.e. 2-OCTAPRENYLPHENOL becomes _2-OCTAPRENYLPHENOL
      2. For each character in the Biocyc id, if the character is not alphanumeric or underscore, replace the character with its ascii value delimited by a double underscore. i.e. _2-OCTAPRENYLPHENOL becomes _2__45__OCTAPRENYLPHENOL

      Note that this algorithm is reversible as long as Biocyc never uses an underscore at the beginning of an id and never happens to have an id with a number delimited by double underscores. Fortunately, it does not.

    • In the notes section, XHTML does not appear to recognize entities such as β and γ. Thus, these strings are replaced :

      1. & becomes &
      2. < becomes &lt;
      3. > becomes &gt;
      4. " becomes &quot;
      5. ' becomes &apos;


    • Coefficients of a reaction had to be normalized in order to be accepted by the SBML spec. Fortunately, the newest level 2 specification accepts floating point numbers for stoichiometry:

      Biocyc coefficient ==> SBML stoichiometry
      N ==> 1

      2N ==> 2

      M ==> 1

      0.5d0 ==> 0.5

  2. No information about compartment appears in the SBML files generated by SymbioCyc : only one compartment ("cytoplasm") is present
    In the same way, a compound that is transported should have an identifier for each compartment. For the moment, these problems are not solved in SymbioCyc


  3. The relationships between genes to enzymes to reactions are written in the notes tag.
    Example :
    
        <notes>
        <html:p>GENE_ASSOCIATION: ( BU278_trpB ) or ( BU277_trpA )</html:p>
        <html:p>PROTEIN_ASSOCIATION: ( Tryptophan synthase beta chain//RXN0-2382//TRYPSYN-RXN//Tryptophan synthase ) or ( Tryptophan synthase alpha chain//TRYPSYN-RXN//Tryptophan synthase )</html:p>    ...
        </notes>
    
        
    This indicates that this reaction can be catalysed by two different enzymes (surrounded with brackets and separated by "or"), each one composed by several monomers (separated by "and") coded by one gene. Be careful, this notation does not take into account the splicing genes.
    "NA" indicates that no information is available about the gene or the protein.


  4. The metabolic pathways where occurs the reaction are indicated in the "notes tag" in the "<html:p>SUBSYSTEM:" field.
    Example :
        <notes>
        ...
        <html:p>SUBSYSTEM: glutathione biosynthesis</html:p>
        <html:p>SUBSYSTEM: &gamma;-glutamyl cycle</html:p>
    
        ...
        </notes>
        
    "NA" indicates that the reaction does not occur in any metabolic pathway,


  5. The EC Number is indicated in the "notes tag" in the "<html:p>PROTEIN_CLASS" tag.
    Example :
        <notes>
        ...
        <html:p>PROTEIN_CLASS: 6.3.2.2</html:p>
        ...
        </notes>
        
    "NA" indicates that the information is not available.
  6. When they are available, the side compounds indicated by BioCyc are indicated for each reaction. A compound is indicated as side compound in a reaction if BioCyc represents it as side compound in each metabolic pathway where the reaction is involved. Each side-compound is indicated in the "notes".
    Example:
        <notes>
        ...
        <html:p>SIDE: ADP</html:p>
        <html:p>SIDE: ATP</html:p>
        ...
        </notes>
        
  7. When the information is available, some compounds are indicated as cofactors in a reaction. The list of cofactor transformations used to mark the compounds are available here. Example:
        <notes>
        ...
        <html:p>COFACTOR: ADP</html:p>
        <html:p>COFACTOR: ATP</html:p>
        ...
        </notes>
        
  8. Also in the notes, the term 'generic' indicates if the reaction involve class compounds.
    Example:
        <notes>
        ...
        <html:p>GENERIC: false</html:p>
        ...
        </notes>
        
  9. In BioCyc, the reactions are classified in small-molecule or macromolecule reactions. When this information is available, the term 'type' indicates if the reaction is classified in either classification.
    Example:
        <notes>
        ...
        <html:p>TYPE: small</html:p>
        ...
        </notes>
        or
        <notes>
        ...
        <html:p>TYPE: macro</html:p>
        ...
        </notes>
        
  10. If the source database, the reaction is not assigned as spontaneous and no enzyme has been assigned to catalyse it, the reaction is considered as a 'hole'. In the extended sbml format, this information, when available, is indicated in the notes. Example:
        <notes>
        ...
        <html:p>HOLE: true</html:p>
        ...
        </notes>
        

Home


This project has been developed in the Baobab Team and the BF2I by Ludovic COTTRET