What is PhEVER?
PhEVER is a database of homologous gene families providing
information for the understanding of virus/host co-evolution. Each
family supplies pre-computed alignment and phylogeny.
How can I use PhEVER?
PhEVER allows you to select sets of homologous genes among viral
species as well as between viral species and some eukaryotes.
Multiple alignments and phylogenetic trees are
available for each family. This makes it a particularly useful tool
for comparative sequence analysis, phylogeny and molecular evolution
studies in the context of co-evolution.
The user manual will provide you
with helpful information on how to query PhEVER from the web
interface.
For more details on how to query PhEVER, please take a look at the search page.
Which organisms are present in PhEVER?
The current release integrates extensive data from up-to-date
completely sequenced genomes spanning a wide taxonomic range (2426
non-redundant viral genomes, 1296 non-redundant bacterial genomes,
44 eukaryotic genomes from plants to human). PhEVER is built from
the following genomic data sources:
-
RefSeq Viral:
- complete data (e.g. 2426 genomes of viruses from all 5
Baltimore classes). Retrieved on May 2010.
-
Ensembl:
-
Aedes aegypti
-
Anopheles gambiae
-
Bos taurus
-
Caenorhabditis elegans
-
Danio rerio
-
Drosophila melanogaster
-
Gallus gallus
-
Homo sapiens
-
Mus musculus
-
Genome Reviews:
-
all fully sequenced Bacteria.
-
all fully sequenced Archaea.
-
the following Eukaryota:
- Dictyostelium discoideum AX4
- Leishmania major strain Friedlin
- Leishmania braziliensis
- Caenorhabditis briggsae
- Ashbya gossypii ATCC 10895
- Saccharomyces cerevisiae
- Candida glabrata CBS 138
- Kluyveromyces lactis NRRL Y-1140
- Pichia stipitis CBS 6054
- Debaryomyces hansenii CBS767
- Yarrowia lipolytica CLIB122
- Candida dubliniensis CD36
- Aspergillus niger CBS 513.88
- Aspergillus oryzae
- Penicillium chrysogenum Wisconsin 54-1255
- Aspergillus nidulans FGSC A4
- Aspergillus fumigatus AF293
- Schizosaccharomyces pombe
- Cryptococcus neoformans var. neoformans JEC21
- Cryptococcus neoformans var. neoformans B-3501A
- Ustilago maydis 521
- Encephalitozoon cuniculi GB-M1
- Plasmodium falciparum 3D7
- Plasmodium knowlesi strain H
- Plasmodium vivax
- Theileria annulata
- Toxoplasma gondii RH
- Cryptosporidium parvum Iowa II
- Paramecium tetraurelia
- Guillardia theta
- Hemiselmis andersenii
- Arabidopsis thaliana
- Oryza sativa Japonica group
- Ostreococcus lucimarinus CCE9901
- Ostreococcus tauri
Data is modified and re-annotated: sequence names are modified
according the organism, taxonomy fields are modified when they are
unconsistant or inaccurate, then gene family , GC contents, internal
introns, 3'UTR and 5'UTR informations are added to annotations.
How are PhEVER families built?
PhEVER is made of two databases: PhEVER(dna) which
contains the nucleotide sequences, and PhEVER(aa) which contains the
protein sequences.
The clustering of PhEVER sequences into families follows this
procedure:
-
Protein sequences of PhEVER are generated by
translating the CDS of PhEVER(dna) and using associated
cross-references to generate the
annotations.
-
To build the families we perform a similarity search of all the
proteins against each other with BLASTP2. For this purpose, we use the
BLOSUM62 similarity matrix and a threshold of 10-4 for E-values. Low
complexity sequences are filtered with SEG. Then, the results are
processed this way:
-
For each pair of sequences, Homologous Segment Pairs (HSPs)
that are not compatible with a global alignment are removed
-
Two sequences in a pair are included in the same family if:
-
The remaining HSPs cover at least 60% of the proteins length
(and at least 100aa).
-
Their identity is greater or equal to 35% (two
amino-acids are considered similar if the BLOSUM62 similarity score is
positive)
-
We use simple transitive links to build our families. If a pair
of sequences named A + B and a pair of sequences B + C
fulfill the conditions listed above, then A, B and C are
integrated in the same family, this even if the pair A + C does not fulfill these conditions.
The current release was built using SiLiX.
For each family, a complete alignment of all sequences is then
computed using MUSCLE and a phylogenetic tree is estimated with
PhyML.
|