HOBACGEN: Homologous Bacterial Genes Database

WARNING: HOBACGEN has been now replaced by the HOGENOM database. HOGENOM contains data from all complete genomes, including of course bacteria and archaea. The HOBACGEN database is therefore no longer maintained.
HOBACGEN is a database system that contains all the protein sequences of bacteria organized into families. It allows one to select sets of homologous genes from bacterial species and to visualize multiple alignments and phylogenetic trees. Thus HOBACGEN is particularly useful for comparative genomics, phylogeny and molecular evolution studies on bacteria.


The database contains all sequences of bacteria (eubacteria and archeae) and yeast taken from SWISS-PROT + TrEMBL,(now the UniProt Knowledgebase) with some annotation modifications. It contains also all the corresponding nucleotide sequences in EMBL. Homologous proteins are classified into families and multiple alignments and phylogenetic trees are computed for each family. The description on how the database is built is available here.

The present version of HOBACGEN is release 10 (February 2002). It has been built using sequences from SWISS-PROT 40, TrEMBL 19 and TrEMBL_NEW - January 25, 2002 (now the UniProt Knowledgebase). It contains a total of 260,025 proteins with the following repartition:

Bacteria 226,811 proteins
Archaea 26,406 proteins
Yeast 6,808 proteins

Among all the proteins included in this release, 193,747 (74.5%) are classified into 23,961 families containing at least two sequences, 58,445 (22.5%) are unique in their family, and 7,833 (3%) partial proteins are not attached to a family.

Important note: the SWISS-PROT entries such as those found in HOBACGEN are copyrighted. They are produced through a collaboration between the Swiss Institute of Bioinformatics and the European Bioinformatics Institute. There are no restrictions on its use by non-profit institutions as long as its content is in no way modified. Usage by and for commercial entities requires a license agreement (See or send an Email to license@isb-sib.ch).

Graphical User Interface

HOBACGEN interface is based on a client/server architecture. To access the database you only need to install the FamFetch application on your computer. This program, written in Java, integrates a GUI that allows users to easily access and visualize:

WWW access

It is also possible to query the database on this server through the WWW-Query system. Note that HOBACGEN is splitted into two databases on this server: HOBACGEN contains the protein sequences from SWISS-PROT + TrEMBL (now the UniProt Knowledgebase) while HOBACCGENDNA contains the nucleotide sequences from EMBL.

Server mirroring

You don't need to install the server itself to have HOBACGEN running on your computer as the client is enough for that purpose. On the other hand you may want to set-up your own server in a way to speed up your database access and to propose that service to potential users in your geographic area. To install an HOBACGEN server, you need first to register. Starting from the registering page results, you will have access to the server installation procedure.

Contact and reference

If you encounter some problems when installing or using HOBACGEN, please contact Guy Perrière or Laurent Duret. Also we welcome any comments or suggestions on the database and/or its interface.

If you use HOBACGEN in any published work, please cite the following reference:

Perrière, G., Duret, L. and Gouy, M. (2000) HOBACGEN: database system for comparative genomics in bacteria. Genome Res., 10, 379-385.


This project is supported by the European Commission ( TEMBLOR, contract-no. QLRI-CT-2001-00015, RTD programme "Quality of Life and Management of Living Resources") as a part of the Integr8 project.
If you have problems or comments...

Back to PBIL home page