Serial Analysis of Gene Expression (SAGE) is a method of large-scale gene expression analysis that has the potential to generate the full list of mRNAs present whithin a cell population at a given time and their frequency. This method is based on the assumption that short cDNA sequences (tags) are in most cases sufficient to identify transcripts. Thus the counting of the tags provides a numerical measurement of quantity of each transcript. An essential step in SAGE library analysis is the unambiguous assignment of each tag to the transcript from which it is derived, which is called tag-to-gene mapping.

We designed and implemented a tool called Identitag for tag-to-gene mapping. This tool is based on a relational database which structure can be depicted as three interconnected modules represented in Identitag relational schema. The first one stores virtual tags extracted from transcript sequences belonging to the species considered the second stores experimental tags observed in SAGE experiments, and the third allows the annotation of the transcript sequences used for virtual tag extraction. Identitag therefore connects an observed tag to a virtual tag and to the transcript sequence from which it is derived, and then to its functional annotation when available. For a complete description of Identitag tables see Identitag data dictionary.

Databases made from different species can be connected according to orthology relationship thus allowing the comparison of SAGE libraries between species. We designed a method to search for putative orthologous sequences between two set of transcript sequences that can be redundant and not representing the entire transcriptome of the two species considered. Identitag can thus be used for comparative transcriptomic analysis.

Use of Identitag for tag-to-gene mapping

This website provides Identitag sources that can be used to build an Identitag database for any species for which transcript sequences are available. These sources can be used on SUN, Linux and MacOSX operating systems, with Bourne Shell and Perl interpreters and a MySQL client (one also need a MySQL server, but it can be on another host). For more details, please read Identitag documentation for tag-to-gene mapping.

Use of Identitag for comparative transcriptomic analysis

Before doing comparative transcriptomic analysis, two Identitag databases for two different species must be built (see previous section). Then this website provides sources allowing to connect these two Identitag databases using orthology relationship : the scripts provided search for putative orthologous sequences between the two set of transcript sequences from these two different species. Then these putative orthologous relationships are loaded into Identitag database. In addition to requirements specify in previous section, blastall, blastclust and formatdb executables must be operational on your computer. For more details, please read Identitag documentation for comparative transcriptomic analysis.

Download Identitag

Identitag sources for tag-to-gene mapping can be downloaded here. Sources allowing to connect two Identitag databases for two different species can be downloaded here.


If you use Identitag in a published work, please cite the following reference:

Keime, C., Damiola, F., Mouchiroud, D., Duret, L. and Gandrillon, O. (2004) Identitag, a relational database for SAGE tag identification and interspecies comparison of SAGE libraries. BMC Bioinformatics, 5, 143 [Abstract] [Full text].

If you encounter some problems when using Identitag, please contact Céline Keime.

