Hoppsigen

(Homologous Processed Pseudogenes database rel 3.0)


Hoppsigen is a nucleic database of homologous processed pseudogenes. The database is developped at the PBIL (Pôle Bioinformatique Lyonnais).

What is a processed pseudogene?

Processed pseudogenes are retroelements (like SINE and LINE). They are generated by two ways: So if we compare processed pseudognes to their functional homologous gene, they have lost their introns and they have no promoters (except those which are inserted near an existing promoter sequence).

Main interests to molecular evolution

Genes giving rise to processed pseudogenes have to be expressed in germ line cells. Their identification could help us to identify precisely genes expressed in these cells. Since they lack promoters, generally they are no longer transcribed so they quickly accumulate mutations. That's the reason why we consider that these sequences are non functional They are no longer subjected to selection constraints. So they could be used to estimate silent substitution rates along genomes in relation with the GC content of the genomes regions where they had inserted.

Processed pseudognes indentification

Processed pseudogenes were identified by looking in complete mouse and human genomes for sequences similar to genes with introns. We have successively used two methods: BLAST (Altschul et al., 1997) and SIM (Huang and Miller, 1991) to identified all retroelements generated by reverse transcribed genes with introns. Then within these retroelements, we have detected non functional sequences and we have annotated them as processed pseudogenes.

A database of retroelements and processed pseudogenes

We have identified 12236 human retroelements and 11391 mouse retroelements. These retroelements were annotated and stored in the database HOPPSIGEN (Homologous processed pseudogenes). Sequences were grouped in families considering their homologies. The database contains 9821 families of human (4941) and mouse retroelements (4880). 5961 human retroelements were annotated as processed pseudogenes (respectively 5000 mouse retroelements). The database contains functional genes from ENSEMBL 8.3 homologous to Hoppsigen retroelements.

For each family we have calculated a multiple alignement between the functional gene and its homologous retroelements using CLUSTAL W 1.7 (Thompson et al., 1994). We also calculated a phylogenetic tree for this alignement using the NJ method implemented in CLUSTAL W with K2 distances (Kimura, 1982).

Querying the database

Hoppsigen is structured under the ACNUC sequence database management system. The Query program allows to browse the database flat files and to select sets of homologous pseudogenes from different species according to specific criteria. If you want to get flat files, please contact me.

The database is also available by Internet, through the WWW-Query query engine. To access the database, you must first select search in nuclec databases or seach by families. then you must choose Hoppsigen in the list of available database.

After choosing the database, it is possible to make request using various keywords, or if the sequence name or a list of sequence name is known, to retrieve the corresponding sequences.

Here is a list of simple keywords:

A complete list of keywords and annotation desciptions are available in this file.

Contact

If you encounter some problems when installing or using Hoppsigen, please contact Adel Khelifi or Dominique Mouchiroud. Also we welcome any comments or suggestions on the database and/or its interface.

If you want to use Hoppsigen in any published work, please send a mail before to Adel Khelifi


If you have problems or comments...

Back to PBIL home page