Database of Homologous Sequences from Complete Genomes

Please visit the LAST RELEASE

HOGENOM release 02

November 2004

HOGENOM release 03 available soon (October 2005) Release informations: Protein Nucleotide

Previous Release:here

Query Hogenom

Search Keyword	Protein sequences	Nucleotide sequences
Search Sequence	Protein sequences	Nucleotide sequences

Hogenom

HOGENOM is a database of homologous genes from fully sequenced organisms, structured under ACNUC sequence database management system. It allows one to select sets of homologous genes among species, and to visualize multiple alignments and phylogenetic trees. Thus HOGENOM is particularly useful for comparative sequence analysis, phylogeny and molecular evolution studies. More generaly, HOGENOM gives an overall view of what is known about a peculiar gene family.

Content

The database itself contains all protein sequences from the EBI proteomes data, with some additional annotations modifications. It contains also all the corresponding nucleotide sequences in Genome Reviews and EMBL. Homologous proteins are classified into families and multiple alignments and phylogenetic trees are computed for each family.

Sequences and related information have been structured in an ACNUC database. A brief description on how the database is built is available here.

The present version of HOGENOM is release 02 (November 2004). It has been built using sequences from the European Bioinformatics Institute proteome data (22th July 2004). It contains a total of 626,687 sequences protein sequences (and 772,359 cds) classified in 200,639 families.

Among all the proteins included in this release, 480,015 (76.6%) are classified into 56,842 families containing at least two sequences, 143,796 (22.9%) are unique in their family and 2,876 (0.4%) partial proteins are not attached to a family.

New feature: Cross references to Ensembl are now available in Hogenom protein annotations for human, mus, rattus, fly, and caenorhabditis elegans.

Species Homo sapiens Mus musculus Rattus norvegicus Drosophila melanogaster Caenorhabditis elegans

Hogenom 28 576 27 340 6 610 16 317 21 781

Ensembl 22 292 25 383 22 160 13 526 19 874

Hogenom
associated to Ensembl 89 % 93 % 85 % 88 % 52 %

Ensembl
associated to Hogenom 84 % 80 % 23 % 95 % 57 %

Example:

ID   ANPA_HUMAN     STANDARD;      PRT;  1061 AA.
AC   P16066;
DT   01-APR-1990 (Rel. 14, Created)
DT   01-APR-1990 (Rel. 14, Last sequence update)
...
...
DR   ENSEMBL:Homo_sapiens;ENSG00000169418;ENST00000306672;ENSP00000305386.
...
...

Fully Sequenced Organisms

There are 182 organims in HOGENOM, among which 13 are eukarya ( Arabidopsis thaliana, Caenorhabditis elegans, Drosophila melanogaster, Encephalitozoon cuniculi, Eremothecium gossypii, Guillardia theta, Homo sapiens, Mus musculus, Oryza sativa, Plasmodium falciparum, Rattus norvegicus, Saccharomyces cerevisiae, Schizosaccharomyces pombe ) and 20 are archaeas.

Among all the proteins, 24% (150,459) belong to eukarya, 69%(434,524) belong to bacteria and 6%(41,704) belong to archaea.

WWW access

It is possible to query the database on this server through the WWW-Query and Cross-Taxa systems. Note that HOGENOM is splitted into two databases on this server: HOGENPROT contains the protein sequences from EBI proteome data while HOGENNUCL contains the nucleotide sequences from EMBL.

FamFetch Graphical User Interface

FamFetch is a graphical interface, specifically developped to query databases of gene families. FamFetch is based on a client/server architecture. To access the database you only need to install the FamFetch application on your computer. This program, written in Java, integrates a GUI that allows users to easily access and visualize:

The list of the families available in the database.
The sequence (protein or nucleotide) of the genes defining these families.
The alignments built with these families.
The phylogenetic trees computed with these alignments.

In FamFetch phylogenetic trees, genes are colored according to the species from which they come. The user can modify the color table according to the taxa (any taxonomic level) he is interested in. This color table is saved in a file of preferences (named .famfetch in UNIX, FamFetch.Prefs in MaOS, HobacFetch.ini in Windows systems). The color table that is installed by default with FamFetch is dedicated to prokaryotes (for the HOBACGEN database). You can replace this preference file by the one we have prepared for HOGENOM, that is available here.

Server mirroring

You don't need to install the server itself to have HOGENOM running on your computer as the client is enough for that purpose. On the other hand you may want to set-up your own server in a way to speed up your database access and to propose that service to potential users in your geographic area. To install an HOGENOM server, you need first to register. Starting from the registering page results, you will have access to the server installation procedure.

The whole database is available from our FTP server at URL:

ftp://pbil.univ-lyon1.fr/pub/hogenom/

Note that it is much more efficient to use a dedicated FTP client to download the database rather than an Internet Web browser.

Important note: the SWISS-PROT entries such as those found in HOGENOM are copyrighted. They are produced through a collaboration between the Swiss Institute of Bioinformatics and the European Bioinformatics Institute. There are no restrictions on its use by non-profit institutions as long as its content is in no way modified. Usage by and for commercial entities requires a license agreement (See or send an Email to license@isb-sib.ch).

Contact and reference

If you encounter some problems when installing or using HOGENOM, please contact Laurent Duret. Also we welcome any comments or suggestions on the database and/or its interface.

Acknowledgements

This project is supported

by the European Commission ( TEMBLOR, contract-no. QLRI-CT-2001-00015, RTD programme "Quality of Life and Management of Living Resources") as a part of the Integr8 project and
by the Rhone-Alpes region as part of "Projet Thematiques Prioritaires".
Calculations have been done at the IN2P3 Computing Center.

If you have problems or comments...

Back to PBIL home page

Species	Homo sapiens	Mus musculus	Rattus norvegicus	Drosophila melanogaster	Caenorhabditis elegans
Hogenom	28 576	27 340	6 610	16 317	21 781
Ensembl	22 292	25 383	22 160	13 526	19 874
Hogenom associated to Ensembl	89 %	93 %	85 %	88 %	52 %
Ensembl associated to Hogenom	84 %	80 %	23 %	95 %	57 %