bpp::RAA Class Reference

Network access to sequence databases (embl, genbank, swissprot, and others). More...

#include <RAA.h>

List of all members.

Public Member Functions

Opening/closing database connections.
 RAA (const std::string &dbname, int port=5558, const std::string &server="pbil.univ-lyon1.fr") throw (int)
 Direct constructor: opens a network connection to a database.
 RAA (int port=5558, const std::string &server="pbil.univ-lyon1.fr") throw (int)
 Direct constructor: opens a network connection to a database server, without specifying a database.
 ~RAA ()
 Destructor: closes both the database access, if any, and the network connection.
int openDatabase (const std::string &dbname, char *(*getpasswordf)(void *)=NULL, void *p=NULL)
 Opens a database from its name.
void closeDatabase ()
 Closes a database connection.
int knownDatabases (vector< std::string > &name, vector< std::string > &description)
 Computes the list of names and descriptions of databases served by the server.
Access to sequence data and annotations.
RaaSeqAttributesgetAttributes (const std::string &name_or_accno)
 Returns several attributes of a sequence from its name or accession number.
RaaSeqAttributesgetAttributes (int seqrank)
 Returns several attributes of a sequence from its database rank.
SequencegetSeq (const std::string &name_or_accno, int maxlength=100000)
 Returns a database sequence identified by name or accession number.
SequencegetSeq (int seqrank, int maxlength=100000)
 Returns a sequence identified by its database rank.
int getSeqFrag (int seqrank, int first, int length, std::string &sequence)
 Returns any part of a sequence identified by its database rank.
int getSeqFrag (const std::string &name_or_accno, int first, int length, std::string &sequence)
 Returns any part of a sequence identified by its name or accession number.
std::string getFirstAnnotLine (int seqrank)
 Returns the first annotation line of the sequence of given database rank.
std::string getNextAnnotLine ()
 Returns the next annotation line after that previously read, or NULL if the end of the database file was reached.
RaaAddress getCurrentAnnotAddress ()
 Returns information identifying the position of the last read annotation line.
std::string getAnnotLineAtAddress (RaaAddress address)
 Returns the annotation line at the given address.
SequencetranslateCDS (int seqrank) throw (BadCharException)
 Returns the full protein translation of a protein-coding nucleotide database (sub)sequence.
SequencetranslateCDS (const std::string &name) throw (BadCharException)
 Returns the full protein translation of a protein-coding nucleotide database (sub)sequence.
char translateInitCodon (int seqrank)
 Returns the amino acid translation of the first codon of a protein-coding (sub)sequence.
Creation of lists of sequences, species or keywords.
RaaListprocessQuery (const std::string &query, const std::string &listname) throw (std::string)
 Returns the list of database elements (often sequences) matching a query.
RaaListcreateEmptyList (const std::string &listname, const std::string &kind=RaaList::LIST_SEQUENCES) throw (int)
 Creates an empty list with specified name.
void deleteList (RaaList *list)
 Deletes a list and calls its destructor.
Access to feature table-defined sequences (nucleotide databases only).
RaaListgetDirectFeature (const std::string &seqname, const std::string &featurekey, const std::string &listname, const std::string &matching="")
 Computes the list of subsequences of a given sequence corresponding to a given feature key with optional annotation string matching.
vector< std::string > listDirectFeatureKeys ()
 Gives all feature keys of the database that can be directly accessed.
vector< std::string > listAllFeatureKeys ()
 Gives all feature keys of the database.
void * prepareGetAnyFeature (int seqrank, const std::string &featurekey) throw (std::string)
 Starts extraction of all features of a specified key present in the feature table of a database sequence.
SequencegetNextFeature (void *opaque)
 Successively returns features specified in a previous prepareGetAnyFeature() call.
void interruptGetAnyFeature (void *opaque)
 Terminates a features extraction session initiated by a prepareGetAnyFeature() call before getNextFeature() call returned NULL.
Browsing database species and keywords.
RaaSpeciesTreeloadSpeciesTree (bool showprogress=true)
 Loads the database's full species tree classification.
void freeSpeciesTree (RaaSpeciesTree *tree)
 Frees the memory occupied by the species tree classification.
int keywordPattern (const std::string &pattern)
 Initializes pattern-matching in database keywords. Matching keywords are then returned by successive nextMatchingKeyword() calls.
int nextMatchingKeyword (std::string &matching)
 Finds next matching keyword in database.

Protected Attributes

raa_db_access * raa_data

Friends

class RaaList


Detailed Description

Network access to sequence databases (embl, genbank, swissprot, and others).

This class provides network access to several nucleotide and protein sequence databases structured for multi-criteria retrieval under the ACNUC system as described in Remote access to ACNUC nucleotide and protein sequence databases at PBIL.

The list of available databases is here. EMBL and GenBank are daily updated; SwissProt (it is in fact UniProt and includes SwissProt and trEMBL) is updated at each partial release; EMBLwgs is updated at each full release (that is, quarterly).

Access can be done to single sequences from their name or accession number or to lists of sequences matching a query combining several retrieval criteria. Any fragment of any sequence defined by coordinates or by feature table entries can be retrieved. Access to sequence annotations is possible. Concurrent access to several databases is possible.

Access is possible to database entries and also to subsequences, i.e., one or more fragments of one or more parent sequences defined by a feature table entry. Subsequences are named by adding an extension (e.g., .PE1) to the name of their parent sequence.


Constructor & Destructor Documentation

bpp::RAA::RAA ( const std::string &  dbname,
int  port = 5558,
const std::string &  server = "pbil.univ-lyon1.fr" 
) throw (int)

Direct constructor: opens a network connection to a database.

Parameters:
dbname The database name (e.g., "embl", "genbank", "swissprot").
port The IP port number of the server (the default value is a safe choice; make sure that no firewall blocks outbound connections on this port).
server The IP name of the server (the default value is a safe choice).
Exceptions:
int An error code as follows:
1: incorrect server name
2: cannot create connection with server
3: unknown database name
4: database is currently not available for remote connection
7: not enough memory

bpp::RAA::RAA ( int  port = 5558,
const std::string &  server = "pbil.univ-lyon1.fr" 
) throw (int)

Direct constructor: opens a network connection to a database server, without specifying a database.

Typical usage is to ask with knownDatabases() for the list of served databases, and then to open the chosen database with openDatabase().

Parameters:
port The IP port number of the server (the default value is a safe choice; make sure that no firewall blocks outbound connections on this port).
server The IP name of the server (the default value is a safe choice).
Exceptions:
int An error code as follows:
1: incorrect server name
2: cannot create connection with server
7: not enough memory


Member Function Documentation

void RAA::closeDatabase (  ) 

Closes a database connection.

Allows to later open another database with openDatabase() using the same RAA object.

RaaList* bpp::RAA::createEmptyList ( const std::string &  listname,
const std::string &  kind = RaaList::LIST_SEQUENCES 
) throw (int)

Creates an empty list with specified name.

Parameters:
listname A name to be given to the resulting list. Case is not significant.
kind Nature of the resulting list. One of RaaList::LIST_SEQUENCES, RaaList::LIST_KEYWORDS, RaaList::LIST_SPECIES.
Returns:
The resulting list, unless an exception was raised.
Exceptions:
int 3: a list with same name already existed; it is left unchanged.
4: the server cannot create more lists.

void RAA::deleteList ( RaaList list  ) 

Deletes a list and calls its destructor.

Parameters:
list An RaaList object.

void RAA::freeSpeciesTree ( RaaSpeciesTree tree  ) 

Frees the memory occupied by the species tree classification.

Parameters:
tree An object previously returned by a loadSpeciesTree() call. It is deleted upon return.

string RAA::getAnnotLineAtAddress ( RaaAddress  address  ) 

Returns the annotation line at the given address.

Parameters:
address Information identifying the position of an annotation line typically obtained from a previous call to getCurrentAnnotAddress().
Returns:
The annotation line at that position (in static memory, without terminal \n).

RaaSeqAttributes * RAA::getAttributes ( int  seqrank  ) 

Returns several attributes of a sequence from its database rank.

Parameters:
seqrank The database rank of a sequence.
Returns:
Several attributes (length, species, etc..., see: RaaSeqAttributes) of a sequence, or NULL if seqrank is not a valid database sequence rank.

RaaSeqAttributes* bpp::RAA::getAttributes ( const std::string &  name_or_accno  ) 

Returns several attributes of a sequence from its name or accession number.

Parameters:
name_or_accno A sequence name or accession number. Case is not significant.
Returns:
Several attributes (length, species, etc..., see: RaaSeqAttributes) of a sequence.

RaaAddress RAA::getCurrentAnnotAddress (  ) 

Returns information identifying the position of the last read annotation line.

Returns:
Information identifying the position of the last read annotation line.

RaaList* bpp::RAA::getDirectFeature ( const std::string &  seqname,
const std::string &  featurekey,
const std::string &  listname,
const std::string &  matching = "" 
)

Computes the list of subsequences of a given sequence corresponding to a given feature key with optional annotation string matching.

This function allows to retrieve all features of the given sequence corresponding to a given feature key and whose annotation optionally contains a given string.
Example:
getDirectFeature("AE005174", "tRNA", "mytrnas", "anticodon: TTG")
retrieves all tRNA features present in the feature table of sequence AE005174 that contain the string "anticodon: TTG" in their annotations, and puts that in a sequence list called "mytrnas". This function is meaningful with nucleotide sequence databases only (not with protein databases).

Parameters:
seqname The name of a database sequence. Case is not significant.
featurekey A feature key (e.g., CDS, tRNA, ncRNA) that must be directly accessible, that is, one of those returned by listDirectFeatureKeys(). Case is not significant.
listname The name to give to the resulting sequence list.
matching An optional string required to be present in the feature's annotations. Case is not significant.
Returns:
The list of subsequences of seqname that correspond to the specified feature key and, optionally, whose annotation contains the matching string, or NULL if no matching sequence exists.

string RAA::getFirstAnnotLine ( int  seqrank  ) 

Returns the first annotation line of the sequence of given database rank.

Parameters:
seqrank Database rank of a sequence.
Returns:
The first annotation line of this sequence (without terminal \n).

string RAA::getNextAnnotLine (  ) 

Returns the next annotation line after that previously read, or NULL if the end of the database file was reached.

Returns:
The next annotation line after that previously read (without terminal \n).

Sequence * RAA::getNextFeature ( void *  opaque  ) 

Successively returns features specified in a previous prepareGetAnyFeature() call.

This function must be called repetitively until it returns NULL or until function interruptGetAnyFeature() is called. Features are processed in their order of appearance in the feature table.

Parameters:
opaque A pointer returned by a previous prepareGetAnyFeature() call.
Returns:
A sequence corresponding to one of the features specified in the prepareGetAnyFeature() call, or NULL if no more such feature exists.

Sequence * RAA::getSeq ( int  seqrank,
int  maxlength = 100000 
)

Returns a sequence identified by its database rank.

Because nucleotide database sequences can be several megabases in length, the maxlength argument avoids unexpected huge sequence downloads.

Parameters:
seqrank The database rank of a sequence.
maxlength The maximum sequence length beyond which the function returns NULL.
Returns:
The database sequence including a one-line comment, or NULL if seqrank does not match any sequence or if the sequence length exceeds maxlength.

Sequence* bpp::RAA::getSeq ( const std::string &  name_or_accno,
int  maxlength = 100000 
)

Returns a database sequence identified by name or accession number.

Because nucleotide database sequences can be several megabases in length, the maxlength argument avoids unexpected huge sequence downloads.

Parameters:
name_or_accno A sequence name or accession number. Case is not significant.
maxlength The maximum sequence length beyond which the function returns NULL.
Returns:
The database sequence including a one-line comment, or NULL if name_or_accno does not match any sequence or if the sequence length exceeds maxlength.

int bpp::RAA::getSeqFrag ( const std::string &  name_or_accno,
int  first,
int  length,
std::string &  sequence 
)

Returns any part of a sequence identified by its name or accession number.

Parameters:
name_or_accno The name or accession number of a sequence. Case is not significant.
first The first desired position within the sequence (1 is the smallest valid value).
length The desired number of residues (can be larger than what exists in the sequence).
sequence Filled upon return with requested sequence data.
Returns:
The length of returned sequence data, or 0 if impossible.

int bpp::RAA::getSeqFrag ( int  seqrank,
int  first,
int  length,
std::string &  sequence 
)

Returns any part of a sequence identified by its database rank.

Parameters:
seqrank The database rank of a sequence.
first The first desired position within the sequence (1 is the smallest valid value).
length The desired number of residues (can be larger than what exists in the sequence).
sequence Filled upon return with requested sequence data.
Returns:
The length of returned sequence data, or 0 if impossible.

void RAA::interruptGetAnyFeature ( void *  opaque  ) 

Terminates a features extraction session initiated by a prepareGetAnyFeature() call before getNextFeature() call returned NULL.

Parameters:
opaque A pointer returned by a previous prepareGetAnyFeature() call.

int bpp::RAA::keywordPattern ( const std::string &  pattern  ) 

Initializes pattern-matching in database keywords. Matching keywords are then returned by successive nextMatchingKeyword() calls.

Parameters:
pattern A pattern-matching string using @ as wildcard (example: RNA@polymerase@). Case is not significant.
Returns:
The maximum length of any database keyword.

int bpp::RAA::knownDatabases ( vector< std::string > &  name,
vector< std::string > &  description 
)

Computes the list of names and descriptions of databases served by the server.

Typically used after creation of an RAA object without database and before openDatabase() call.

Returns:
The number of served databases.
Parameters:
name Vector of database names. Any of these names can be used in openDatabase() calls.
description Vector of database descriptions. A description can begin with "(offline)" to mean the database is currently not available.

vector< string > RAA::listAllFeatureKeys (  ) 

Gives all feature keys of the database.

These feature keys (e.g., CDS, conflict, misc_feature) can be used with function prepareGetAnyFeature(). This function is meaningful with nucleotide sequence databases only (not with protein databases).

Returns:
A string vector listing all feature keys of the database.

vector< string > RAA::listDirectFeatureKeys (  ) 

Gives all feature keys of the database that can be directly accessed.

These feature keys (e.g., CDS, rRNA, tRNA) can be used with function getDirectFeature(). This function is meaningful with nucleotide sequence databases only (not with protein databases).

Returns:
A string vector listing all feature keys of the database that can be directly accessed.

RaaSpeciesTree * RAA::loadSpeciesTree ( bool  showprogress = true  ) 

Loads the database's full species tree classification.

This call takes a few seconds to run on large databases because much data get downloaded from the server.

Parameters:
showprogress If true, progress information gets sent to stdout.
Returns:
An object allowing work with the full species tree (see RaaSpeciesTree), or NULL if error.

int bpp::RAA::nextMatchingKeyword ( std::string &  matching  ) 

Finds next matching keyword in database.

Parameters:
matching Set to the next matching keyword upon return.
Returns:
The database rank of the next matching keyword, or 0 if no more matching keyword.

int bpp::RAA::openDatabase ( const std::string &  dbname,
char *(*)(void *)  getpasswordf = NULL,
void *  p = NULL 
)

Opens a database from its name.

Parameters:
dbname The database name (e.g., "embl", "genbank", "swissprot").
getpasswordf NULL, or, for a password-protected database, pointer to a password-providing function that returns the password as a writable static char string.
p NULL, or pointer to data transmitted as argument of getpasswordf.
Returns:
0 if OK, or an error code as follows:
3: unknown database name
4: database is currently not available for remote connection
5: a database was previously opened on this RAA object and not closed
6: incorrect password for password-protected database
7: not enough memory

void* bpp::RAA::prepareGetAnyFeature ( int  seqrank,
const std::string &  featurekey 
) throw (std::string)

Starts extraction of all features of a specified key present in the feature table of a database sequence.

A database sequence can contain many instances of a given feature key in its feature table. Thus, feature extraction is done by first preparing the desired feature extraction, and by then successively extracting features by getNextFeature() calls until no more exist in the feature table or until a call to interruptGetAnyFeature() is done. Any successful prepareGetAnyFeature() call must be followed by getNextFeature() calls until it returns NULL or by a call to interruptGetAnyFeature(); any call to other RAA member functions in between is prohibited.

Parameters:
seqrank The database rank of a sequence.
featurekey Any feature key (direct or not) defined in EMBL/GenBank/DDBJ feature tables. These are also returned by listAllFeatureKeys().
Returns:
An opaque pointer to be transmitted to functions getNextFeature() or interruptGetAnyFeature().
Exceptions:
string A message indicating the cause of the error.

RaaList* bpp::RAA::processQuery ( const std::string &  query,
const std::string &  listname 
) throw (std::string)

Returns the list of database elements (often sequences) matching a query.

Query examples:
k=ribosomal protein L14
sp=felis catus and t=cds

Parameters:
query A retrieval query following the syntax described here.
listname A name to be given to the resulting list. Case is not significant. If a list with same name already exists, it is replaced by the new list.
Returns:
The resulting list of matching database elements.
Exceptions:
string If error, the string is a message describing the error cause.

Sequence* bpp::RAA::translateCDS ( const std::string &  name  )  throw (BadCharException)

Returns the full protein translation of a protein-coding nucleotide database (sub)sequence.

Parameters:
name The name of a protein-coding sequence. It can be either a subsequence corresponding to a CDS feature table entry, or a sequence if all of it belongs to the CDS.
Returns:
The complete protein translation of this CDS, using the genetic code suggested by the sequence annotations and with a one-line comment, or NULL if name does not match a CDS or if not enough memory.
Exceptions:
BadCharException In rare cases, the CDS may contain an internal stop codon that raises an exception when translated to protein.

Sequence * RAA::translateCDS ( int  seqrank  )  throw (BadCharException)

Returns the full protein translation of a protein-coding nucleotide database (sub)sequence.

Parameters:
seqrank The database rank of a protein-coding sequence. It can be either a subsequence corresponding to a CDS feature table entry, or a sequence if all of it belongs to the CDS.
Returns:
The complete protein translation of this CDS, using the genetic code suggested by the sequence annotations and with a one-line comment, or NULL if seqrank does not match a CDS or if not enough memory.
Exceptions:
BadCharException In rare cases, the CDS may contain an internal stop codon that raises an exception when translated to protein.

char RAA::translateInitCodon ( int  seqrank  ) 

Returns the amino acid translation of the first codon of a protein-coding (sub)sequence.

Parameters:
seqrank The database rank of a protein-coding sequence. It can be either a subsequence corresponding to a CDS feature table entry, or a sequence if all of it belongs to the CDS.
Returns:
The amino acid corresponding to the start codon of this sequence, using the adequate initiation-codon-specific genetic code.


The documentation for this class was generated from the following files:

Generated on Sat Sep 5 17:53:47 2009 for RAA by  doxygen 1.5.9