ACNUC physical structure

An ACNUC database is made of a series of index files ( ACCESS, AUTHOR, BIBLIO, EXTRACT, KEYWORDS , LOCUS, LONGL, SHORTL, SMJYT, SPECIES, SUBSEQ, TEXT ) that allow efficient access to sequences and annotations through a variety of selection criteria. Sequences and annotations are stored in flat files (e.g., fun.dat, gbbct1.seq) created by the database producers (e.g., EMBL, GenBank, SwissProt) that are accessed by ACNUC in a strictly readonly mode.

One-page summary of structure.

Index files are made of a series of fixed-length records containing several fields that are 4-byte unsigned integer values except when indicated. Records are referred to by their number or rank, counting from 1.

The first record of all index files follows this structure :
total |sorting state|end_sorted|


Constants:
L_MNEMO = 16
WIDTH_KS = 40
SUBINLNG = 63
ACC_LENGTH = value ≥8 read at run-time when the database is opened


SUBSEQ one record for each parent or sub-sequence
name |length|  type  |pext  P:≤0 , S:>0 | plkey  | plinf        |    phase     |  h   |
     |      |to SMJYT| P: subseq list   | SHORTL |P: LOCUS      |100*code+frame|SUBSEQ|
                     | S: to EXTRACT    |        |S: feat start |       

LOCUS one record for each parent sequence
sub   |pnuc|pinf| bef |next |spec     |host   |plref |molec|placc |stat | org | div |date|
SUBSEQ|    |    |LOCUS|LOCUS|N:SPECIES|SPECIES|SHORTL|SMJYT|SHORTL|SMJYT|SMJYT|     |    |
                            |P:SHORTL |

KEYWORDS and SPECIES one record for each keyword or taxon
name|libel|plsub| desc | syno   |    h   |plhost|
    |TEXT |LONGL|SHORTL|KEYWORDS|KEYWORDS|
                       |SPECIES |SPECIES |LONGL |
The last field, plhost, exists in SPECIES and is absent from KEYWORDS.
BIBLIO one record for each reference
name|plsub |plaut |  j  |  y  |
    |SHORTL|SHORTL|SMJYT|SMJYT|

AUTHOR one record for each author name
name|plref | fut  |
    |SHORTL|unused|

ACCESS one record for each accession number
name|plsub |
    |SHORTL|

SMJYT one record for each status, molecule, journal, year, type, organelle, division, and db structure information
name|plong|libel|
    |LONGL|TEXT |

EXTRACT (for nucleotide databases only) one record for each exon of each subsequence
mere  |deb|fin|pnuc| next  |
SUBSEQ|   |   |    |EXTRACT|

TEXT one 60-character record for each label of a species, keyword, or SMJYT
   label  |
In the case of species, labels may contain information about the correct genetic codes for this species and about the name of the taxonomic level (e.g., order, family).
LONGL one record for each group of SUBINLNG elements of a long list
sub[0],sub[1],...,sub[SUBINLNG-1] |next |
     SUBSEQ,...                   |LONGL|

SHORTL mostly, one record for each element of a short list
val | next |
    |SHORTL|