The annotations of the SWISS-PROT + TrEMBL and EMBL subsets used in HOBACGEN are slightly modified to incorpore complementary data related to families and protein domains.

SWISS-PROT + TrEMBL annotations

First we add for each entry a line in the CC field that gives the number of the family the sequence belongs to:
CC   -!- GENE_FAMILY: HBG017522.
This number is incorporated in the keywords associated to the corresponding entry in the ACNUC database structure. Due to that fact it is possible to retrieve all the sequences associated to a family with this number when using the retrieval system Query or the on-line version WWW-Query.

We also add data on the localization of PRODOM domains that are found in a given entry. These data are integrated in the FT field:

FT   PRODOM    begin    end       domain_ID       domain_#
FT   PRODOM        2     44       p99.1_24658     3
FT   PRODOM       45    361       p99.1_8718      8
FT   PRODOM      362    398       p99.1_133440    1
FT   PRODOM      399    494       p99.1_9971      7
The prefix of domain_ID (p99.1) indicates the PRODOM release from which these annotations were derived. The suffix (e.g. 24658) indicates the identifier of the domain in the corresponding PRODOM release.

The domain_# indicates the number of occurence of this domain in the PRODOM database. Soon these features will be available as subsequences under the ACNUC structure.

EMBL annotations

In this subset we add for each coding sequence a qualifier that gives the number of the family the gene belongs to:
FT                   /gene_family="HBG017522"
This modification is the only one we do on nucleotide sequences.
