Hoppsigen
(Homologous Processed Pseudogenes database rel 4.0)
Hoppsigen is a nucleic database of homologous processed pseudogenes. The database
is developped at the PBIL (Pôle
Bioinformatique Lyonnais).
What is a processed pseudogene?
Processed pseudogenes are retroelements (like SINE and LINE). They are generated by
two ways:
- Reverse transcription of RNA messenger in germ line cells.
- Duplication of an existing processed pseudogene in germ line cells.
So if we compare processed pseudogenes to their functional homologs, they
have lost introns and they have no promoter (except for processed pseudogenes which are
inserted near an existing promoter).
Main interests to molecular evolution
Genes giving rise to processed pseudogenes have to be expressed in germ line
cells. Their identification could help us to identify precisely genes expressed
in these cells. Since they lack promoters, generally they are no longer transcribed
so they quickly accumulate mutations. That's the reason why we consider that these
sequences are non functional. They are no longer subjected to selection constraints.
So they could be used to estimate silent substitution rates along genomes in
relation with the local GC content.
Processed pseudogenes identification
Processed pseudogenes were identified by looking in complete genomes of mouse and human for sequences similar to genes with introns. We have successively used
two methods: BLAST (Altschul et al., 1997) and SIM (Huang and Miller, 1991)
to identify all retroelements generated by reverse transcribed genes with introns.
Then within these retroelements, we have detected non functional sequences and we
have annotated them as processed pseudogenes.
A database of retroelements and processed pseudogenes
We have identified 5,823 human retroelements and 3,934 mouse retroelements. These
retroelements were annotated and stored in the database HOPPSIGEN (Homologous
processed pseudogenes). Sequences were grouped in families considering their
homologies. The database contains 3,168 families of exclusively human (1,966) or mouse
retroelements (1,202) and 323 families containing human and mouse retroelements. 5,206 human retroelements were annotated as processed
pseudogenes (respectively 3,428 mouse retroelements). The database contains
functional genes from ENSEMBL homologous to Hoppsigen retroelements.
For each family we have calculated a multiple alignment between
the functional gene and its homologous retroelements using
CLUSTAL W 1.7
(Thompson et al., 1994). We also calculated a phylogenetic tree for this
alignment using the Neighbor-joining (Saitou and Nei, 1987) method implemented in CLUSTAL W with K2 distances
(Kimura, 1980).
Querying the database
Hoppsigen is structured under the
ACNUC sequence database
management system. The Query
program allows to browse the database flat files and to select sets of homologous
pseudogenes from different species according to specific criteria.
The database is also available by Internet (flat files and fasta sequences), through the
WWW-Query
query engine. To access the database, you must first select search in Nucleotid
databank or Search for families, alignments and trees
. Then you must choose Hoppsigen in the list of
available databases.
Then, it is possible to make request using various keywords,
or if the sequence name or a list of sequence name is known, to retrieve the
corresponding sequences.
Here is a list of simple keywords:
- ppgene: this keyword allows to select all retroelements. By combining it
with a species or a taxon, it is possible to extract retroelements for one or few
species.
- CDS: select only coding sequences of the functional gene.
- CDE: to select only the region of the pseudogene homologous to CDS.
- 5'FL (or 3'FL): to select the regions of the pseudogenes similar to
the 5'NCR (or 3'NCR) of the functional gene.
A complete list of keywords and annotation descriptions are available in this
file.
Some useful examples of request are also available
here.
A list of the 51 human and 27 mouse HOPPSIGEN families containing more than 10 sequences is also available
here.
A list of the 323 HOPPSIGEN families shared by human and mouse is available
here.
Contact
If you encounter some problems when installing or using
Hoppsigen, please contact Adel
Khelifi: khelifiNOSPAMbiomserv.univ-lyon1.fr or Dominique
Mouchiroud: mouchiNOSPAMbiomserv.univ-lyon1.fr (replace NOSPAM by @). Also we welcome any comments or suggestions on the database
and/or its interface.
If you have problems or comments...
Back to PRABI-Doua home page