Hoppsigen

(Homologous Processed Pseudogenes database rel 4.0)

Hoppsigen is a nucleic database of homologous processed pseudogenes. The database is developped at the PBIL (Pôle Bioinformatique Lyonnais).

What is a processed pseudogene?

Processed pseudogenes are retroelements (like SINE and LINE). They are generated by two ways:

Reverse transcription of RNA messenger in germ line cells.
Duplication of an existing processed pseudogene in germ line cells.

So if we compare processed pseudogenes to their functional homologs, they have lost introns and they have no promoter (except for processed pseudogenes which are inserted near an existing promoter).

Main interests to molecular evolution

Genes giving rise to processed pseudogenes have to be expressed in germ line cells. Their identification could help us to identify precisely genes expressed in these cells. Since they lack promoters, generally they are no longer transcribed so they quickly accumulate mutations. That's the reason why we consider that these sequences are non functional. They are no longer subjected to selection constraints. So they could be used to estimate silent substitution rates along genomes in relation with the local GC content.

Processed pseudogenes identification

Processed pseudogenes were identified by looking in complete genomes of mouse and human for sequences similar to genes with introns. We have successively used two methods: BLAST (Altschul et al., 1997) and SIM (Huang and Miller, 1991) to identify all retroelements generated by reverse transcribed genes with introns. Then within these retroelements, we have detected non functional sequences and we have annotated them as processed pseudogenes.

A database of retroelements and processed pseudogenes

We have identified 5,823 human retroelements and 3,934 mouse retroelements. These retroelements were annotated and stored in the database HOPPSIGEN (Homologous processed pseudogenes). Sequences were grouped in families considering their homologies. The database contains 3,168 families of exclusively human (1,966) or mouse retroelements (1,202) and 323 families containing human and mouse retroelements. 5,206 human retroelements were annotated as processed pseudogenes (respectively 3,428 mouse retroelements). The database contains functional genes from ENSEMBL homologous to Hoppsigen retroelements.

For each family we have calculated a multiple alignment between the functional gene and its homologous retroelements using CLUSTAL W 1.7 (Thompson et al., 1994). We also calculated a phylogenetic tree for this alignment using the Neighbor-joining (Saitou and Nei, 1987) method implemented in CLUSTAL W with K2 distances (Kimura, 1980).

Querying the database

Hoppsigen is structured under the ACNUC sequence database management system. The Query program allows to browse the database flat files and to select sets of homologous pseudogenes from different species according to specific criteria.

The database is also available by Internet (flat files and fasta sequences), through the WWW-Query query engine. To access the database, you must first select search in Nucleotid databank or Search for families, alignments and trees . Then you must choose Hoppsigen in the list of available databases.

Then, it is possible to make request using various keywords, or if the sequence name or a list of sequence name is known, to retrieve the corresponding sequences.

Here is a list of simple keywords:

ppgene: this keyword allows to select all retroelements. By combining it with a species or a taxon, it is possible to extract retroelements for one or few species.
CDS: select only coding sequences of the functional gene.
CDE: to select only the region of the pseudogene homologous to CDS.
5'FL (or 3'FL): to select the regions of the pseudogenes similar to the 5'NCR (or 3'NCR) of the functional gene.

A complete list of keywords and annotation descriptions are available in this file.

Some useful examples of request are also available here.

A list of the 51 human and 27 mouse HOPPSIGEN families containing more than 10 sequences is also available here.

A list of the 323 HOPPSIGEN families shared by human and mouse is available here.

Contact

If you encounter some problems when installing or using Hoppsigen, please contact Adel Khelifi: khelifiNOSPAMbiomserv.univ-lyon1.fr or Dominique Mouchiroud: mouchiNOSPAMbiomserv.univ-lyon1.fr (replace NOSPAM by @). Also we welcome any comments or suggestions on the database and/or its interface.

If you have problems or comments...

Back to PRABI-Doua home page