Homologous Sequences in Complete Genomes Database

HOGENOM release 01

Updated on 8th September 2003

Please use the last release

Release informations: Protein Nucleotide

Query Hogenom

Search Keyword	Protein sequences	Nucleotide sequences
Search Sequence	Protein sequences	Nucleotide sequences

Hogenom

HOGENOM is a database of homologous genes from fully sequenced organisms, structured under ACNUC sequence database management system. It allows one to select sets of homologous genes among species, and to visualize multiple alignments and phylogenetic trees. Thus HOGENOM is particularly useful for comparative sequence analysis, phylogeny and molecular evolution studies. More generaly, HOGENOM gives an overall view of what is known about a peculiar gene family.

Content

The database itself contains all protein sequences from the EBI proteomes data, with some data corrected, clarified or completed (notably to address the problem of redundancy and orthology/paralogy)and with some annotation modifications. It contains also all the corresponding nucleotide sequences in EMBL. Homologous proteins are classified into families and multiple alignments and phylogenetic trees are computed for each family.

Sequences and related information have been structured in an ACNUC database. The description on how the database is built is available here.

The present version of HOGENOM is release 01 (September 2003). It has been built using sequences from the European Bioinformatics Institute proteome data (18 June 2003). It contains a total of 423,577 sequences protein sequences (and 527,928 nucleic sequences) classified in 157,279 families.

Among all the proteins included in this release, 305,514 (72%) are classified into 41,907 families containing at least two sequences, 115,373 (27%) are unique in their family and 2,690 (1%) partial proteins are not attached to a family.

The size distribution of the HOGENOM families is described here.

Fully Sequenced Organisms

The list of the organisms in HOGENOM is given here. There is 117 organims, among which 10 are eukarya (Guillardia theta, Mus musculus, Rattus norvegicus, Arabidopsis thaliana, Caenorhabditis elegans, Drosophila melanogaster,Encephalitozoon cuniculi, Saccharomyces cerevisiae, Homo sapiens, Schizosaccharomyces pombe) and 16 are archaea.

Among all the proteins, 31% belong to eukarya, 60% belong to bacteria and 9% belong to archaea.

Graphical User Interface

HOGENOM interface is based on a client/server architecture. To access the database you only need to install the FamFetch application on your computer. This program, written in Java, integrates a GUI that allows users to easily access and visualize:

The list of the families available in the database.
The sequence (protein or nucleotide) of the genes defining these families.
The alignments built with these families.
The phylogenetic trees computed with these alignments.

In FamFetch phylogenetic trees, genes are colored according to the species from which they come. The user can modify the color table according to the taxa (any taxonomic level) he is interested in. This color table is saved in a file of preferences (named .hobacfetch in UNIX, HobacFetch.Prefs in MaOS, HobacFetch.ini in Windows systems). The color table that is installed by default with FamFetch is dedicated to prokaryotes (for the HOBACGEN database). You can replace this preference file by the one we have prepared for HOGENOM, that is available here.

WWW access

It is also possible to query the database on this server through the WWW-Query and Cross-Taxa systems. Note that HOGENOM is splitted into two databases on this server: HOGENPROT contains the protein sequences from EBI proteome data while HOGENNUCL contains the nucleotide sequences from EMBL.

Server mirroring

You don't need to install the server itself to have HOGENOM running on your computer as the client is enough for that purpose. On the other hand you may want to set-up your own server in a way to speed up your database access and to propose that service to potential users in your geographic area. To install an HOGENOM server, you need first to register. Starting from the registering page results, you will have access to the server installation procedure.

The whole database is available from our FTP server at URL:

ftp://pbil.univ-lyon1.fr/pub/hogenom/

Note that it is much more efficient to use a dedicated FTP client to download the database rather than an Internet Web browser.

Important note: the SWISS-PROT entries such as those found in HOGENOM are copyrighted. They are produced through a collaboration between the Swiss Institute of Bioinformatics and the European Bioinformatics Institute. There are no restrictions on its use by non-profit institutions as long as its content is in no way modified. Usage by and for commercial entities requires a license agreement (See or send an Email to license@isb-sib.ch).

Contact and reference

If you encounter some problems when installing or using HOGENOM, please contact Laurent Duret. Also we welcome any comments or suggestions on the database and/or its interface.

Acknowledgements

This project is supported

by the European Commission ( TEMBLOR, contract-no. QLRI-CT-2001-00015, RTD programme "Quality of Life and Management of Living Resources") as a part of the Integr8 project and
by the Rhone-Alpes region as part of "Projet Thematiques Prioritaires".
Calculations have been done at the IN2P3 Computing Center.

If you have problems or comments...

Back to PBIL home page