PBIL

HOMOLENS : Homologous Sequences in Ensembl Animal Genomes

HOMOLENS release 05 (January 2011)

Release informations: Protein Nucleotide

Previous release

HOMOLENS is a database of homologous genes from Ensembl organisms and Ensembl families, structured under ACNUC sequence database management system. It allows to select sets of homologous genes among species, and to visualize multiple alignments and phylogenetic trees.It is as well possible to search for orthologous genes in a wide rane of taxons. Thus HOMOLENS is particularly useful for comparative sequence analysis, phylogeny and molecular evolution studies. More generaly, HOMOLENS gives an overall view of what is known about a peculiar gene family. Note that HOMOLENS is splitted into two databases on this server: HOMOLENS contains the protein sequences while HOMOLENSDNA contains the nucleotide sequences. Protein sequences of HOMOLENS have been generated by translating the CDS of HOMOLENSDNA and using associated cross-references to generate the annotations.

New! We do not calculate families anymore. Families and alignments are now directly taken from Ensembl. However phylogenetic trees are not taken from Ensembl but calculated from alignements with Phyml. These trees can bee used to retrieve orthologs with FamFetch.

Query HOMOLENS using web applications

Acces to several ACNUC databases: EMBL,GenBank,SwissProt, HOVERGEN, HOGENOM, HOMOLENS, etc.

Retrieve Tree

Retrieving Tree from Ensembl protein identifier

Query:
Please give an Ensembl protein identifier (for example : ENSCSAVP00000020070 ). The sequence will be signaled in the associated tree.

Query HOMOLENS proteins

You may enter any word ( sequence name, keyword, species, ...)
Protein sequences Protein families

exact
match

enter a word

exact
match

enter a word
Check the box if you want to report exact matches only.

Query HOMOLENS nucleotide sequences

You may enter any word ( sequence name, keyword, species, ...)
CDS sequences CDS families

exact
match

enter a word

exact
match

enter a word
Check the box if you want to report exact matches only.

Query HOMOLENS using BLAST

You may blast your sequence against several databases at PBIL.

Query HOMOLENS using HoSeqI

You may search the HOMOLENS family which is the closest of your sequence. Associated alignment and phylogenetic tree are automatically generated.
  • HoseqI Allows to retrieve protein family in HOMOLENS

Orthologs search

You can retrieve orthologous and paralogous genes with the FamFetch application. This is a powerful tool allowing you to request the phylogenic trees database with a complex tree user-build motif including duplication and speciation events. You can use as well a command-line version of FamFetch.

Acces to HOMOLENS

You can query the database on thispage or via several access :

Contents

Organisms

HOMOLENS is build from the Ensembl database:
  • Ensembl (Release 60) : Animals from the EBI :
    • Ailuropoda melanoleuca
    • Anolis carolinensis
    • Bos taurus
    • Caenorhabditis elegans
    • Callithrix jacchus
    • Canis lupus familiaris
    • Cavia porcellus
    • Choloepus hoffmanni
    • Ciona intestinalis
    • Ciona savignyi
    • Danio rerio
    • Dasypus novemcinctus
    • Dipodomys ordii
    • Drosophila melanogaster
    • Echinops telfairi
    • Equus caballus
    • Erinaceus europaeus
    • Felis catus
    • Gallus gallus
    • Gasterosteus aculeatus
    • Gorilla gorilla
    • Homo sapiens
    • Lama pacos
    • Loxodonta africana
    • Macaca mulatta
    • Macropus eugenii
    • Microcebus murinus
    • Monodelphis domestica
    • Mus musculus
    • Myotis lucifugus
    • Ochotona princeps
    • Ornithorhynchus anatinus
    • Oryctolagus cuniculus
    • Oryzias latipes
    • Otolemur garnettii
    • Pan troglodytes
    • Pongo pygmaeus
    • Procavia capensis
    • Pteropus vampyrus
    • Rattus norvegicus
    • Saccharomyces cerevisiae
    • Sorex araneus
    • Spermophilus tridecemlineatus
    • Sus scrofa
    • Taeniopygia guttata
    • Takifugu rubripes
    • Tarsius syrichta
    • Tetraodon nigroviridis
    • Tupaia belangeri
    • Tursiops truncatus
    • Xenopus (silurana) tropicalis
Data are modified and re-annotated: gene family , GC contents, internal introns, 3'UTR and 5'UTR informations are added to annotations.

Sequences, Families, Alignments, Phylogenetic trees

Number of proteins 1,200,609
Number of CDS 1,219,109
Number of families (at least 2 sequences) 18,499
Number of orphans 340,118 (28%)
Number of protein sequences associated to a family 860,491 (72%)

Phylogenetic trees for all 18, 499 Ensembl families (containing beteween 2 and 400 sequences) have been calculated. Phylogenetic trees are calculated with the program PHYML V3.0 (substitution model = LG, estimated proportion of invariable sites, 4 categories, estimated gamma, initial tree with BIONJ, best of "NNI" and "SPR" topology exploration, SH-like branch supports) on conserved blocks of the MUSCLE alignments selected with GBLOCKS.

Paralogy/Orthology Events Assignment Phylogenetic trees of each gene family are analysed using RapMasse to assign duplication or speciation event to each node by comparison with the species tree. For details on ortholog detection, see "Orthologs search".

HOMOLENS is now available. You can make requests on the protein and the genome data via our web server or via the socket server.

Server mirroring

You don't need to install the server itself to have HOMOLENS running on your computer as the client is enough for that purpose. On the other hand you may want to set-up your own server in a way to speed up your database access and to propose that service to potential users in your geographic area.Installation instructions can be found at http://pbil.univ-lyon1.fr/databases/acnuc/localinstall.html

The whole database is available from our FTP server at URL: ftp://pbil.univ-lyon1.fr/pub/homolens/ Note that it is much more efficient to use a dedicated FTP client to download the database rather than an Internet Web browser.

Sequence Annotations

Family annotation

Protein sequences: we add for each entry a line in the CC field that gives the number of the family the sequence belongs to:
CC   -!- GENE_FAMILY: HBG017522.
Genome sequence: we add for each coding sequence a qualifier that gives the number of the family the gene belongs to:
FT                   /gene_family="HBG017522"

This number is incorporated in the keywords associated to the corresponding entry in the ACNUC database structure. Due to that fact it is possible to retrieve all the sequences associated to a family with this number when using the retrieval system Query or the on-line version WWW-Query.

GC content and intron information annotations

We include in the the genomic sequneces the GC content of each coding sequence:
FT                   /%(C+G)="CG<35%"
FT                   /note="C+G content in third codon positions = 31.4 % "
It is thus possible to select sequences according to its GC content.

We also include in genmoic sequences descriptions of non-coding regions:

  • INT_INT: internals introns (i.e. within CDS)
  • 5'INT: introns in 5'UTR
  • 3'INT: introns in 3'UTR
  • 5'NCR: 5' non-coding region
  • 3'NCR: 3' non-coding region
For example:
FT   3'ncr           2278..2368
These subsequences can be selected and extracted from the database in the same way as CDS, using WWW-Query (see Help).

Contact and reference

If you encounter some problems when installing or using HOMOLENS, please contact Laurent Duret or Simon Penel Also we welcome any comments or suggestions on the database and/or its interface.

Acknowledgements

Calculations have been done at the IN2P3 Computing Center.

Licence

HOMOLENS Database
Copyright 2005 CNRS
Authors: Laurent Duret,Manolo Gouy, Simon Penel, Guy Perriere

This database is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
A copy of the GNU General Public License is available at ftp://pbil.univ-lyon1.fr/pub/hogenom and http://www.gnu.org/licenses/.

ENSEMBL Database
http://www.ensembl.org/info/about/code_licence.html
Copyright © 1999-2011 The European Bioinformatics Institute and Genome Research Limited, and others. All rights reserved.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
3. The name "Ensembl" must not be used to endorse or promote products derived from this software without prior written permission. For written permission, please contact helpdesk@ensembl.org
4. Products derived from this software may not be called "Ensembl" nor may "Ensembl" appear in their names without prior written permission of the Ensembl developers.
5. Redistributions in any form whatsoever must retain the following acknowledgement:
"This product includes software developed by Ensembl (http://www.ensembl.org/)."
THIS SOFTWARE IS PROVIDED BY THE ENSEMBL GROUP "AS IS" AND ANY EXPRESSED OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE ENSEMBL GROUP OR ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

References

If you use families from HOMOLENS or HOGENOM, please cite :
Penel S, Arigon AM, Dufayard JF, Sertier AS, Daubin V, Duret L, Gouy M and Perrière G (2009)
"Databases of homologous gene families for comparative genomics" BMC Bioinformatics, 10 (Suppl 6):S3
If you use families from HOMOLENS, please cite as well the Ensembl database.