#include <RAA.h>
Public Member Functions | |
Opening/closing database connections. | |
RAA (const std::string &dbname, int port=5558, const std::string &server="pbil.univ-lyon1.fr") throw (int) | |
Direct constructor: opens a network connection to a database. | |
RAA (int port=5558, const std::string &server="pbil.univ-lyon1.fr") throw (int) | |
Direct constructor: opens a network connection to a database server, without specifying a database. | |
~RAA () | |
Destructor: closes both the database access, if any, and the network connection. | |
int | openDatabase (const std::string &dbname, char *(*getpasswordf)(void *)=NULL, void *p=NULL) |
Opens a database from its name. | |
void | closeDatabase () |
Closes a database connection. | |
int | knownDatabases (vector< std::string > &name, vector< std::string > &description) |
Computes the list of names and descriptions of databases served by the server. | |
Access to sequence data and annotations. | |
RaaSeqAttributes * | getAttributes (const std::string &name_or_accno) |
Returns several attributes of a sequence from its name or accession number. | |
RaaSeqAttributes * | getAttributes (int seqrank) |
Returns several attributes of a sequence from its database rank. | |
Sequence * | getSeq (const std::string &name_or_accno, int maxlength=100000) |
Returns a database sequence identified by name or accession number. | |
Sequence * | getSeq (int seqrank, int maxlength=100000) |
Returns a sequence identified by its database rank. | |
int | getSeqFrag (int seqrank, int first, int length, std::string &sequence) |
Returns any part of a sequence identified by its database rank. | |
int | getSeqFrag (const std::string &name_or_accno, int first, int length, std::string &sequence) |
Returns any part of a sequence identified by its name or accession number. | |
std::string | getFirstAnnotLine (int seqrank) |
Returns the first annotation line of the sequence of given database rank. | |
std::string | getNextAnnotLine () |
Returns the next annotation line after that previously read, or NULL if the end of the database file was reached. | |
RaaAddress | getCurrentAnnotAddress () |
Returns information identifying the position of the last read annotation line. | |
std::string | getAnnotLineAtAddress (RaaAddress address) |
Returns the annotation line at the given address. | |
Sequence * | translateCDS (int seqrank) throw (BadCharException) |
Returns the full protein translation of a protein-coding nucleotide database (sub)sequence. | |
Sequence * | translateCDS (const std::string &name) throw (BadCharException) |
Returns the full protein translation of a protein-coding nucleotide database (sub)sequence. | |
char | translateInitCodon (int seqrank) |
Returns the amino acid translation of the first codon of a protein-coding (sub)sequence. | |
Creation of lists of sequences, species or keywords. | |
RaaList * | processQuery (const std::string &query, const std::string &listname) throw (std::string) |
Returns the list of database elements (often sequences) matching a query. | |
RaaList * | createEmptyList (const std::string &listname, const std::string &kind=RaaList::LIST_SEQUENCES) throw (int) |
Creates an empty list with specified name. | |
void | deleteList (RaaList *list) |
Deletes a list and calls its destructor. | |
Access to feature table-defined sequences (nucleotide databases only). | |
RaaList * | getDirectFeature (const std::string &seqname, const std::string &featurekey, const std::string &listname, const std::string &matching="") |
Computes the list of subsequences of a given sequence corresponding to a given feature key with optional annotation string matching. | |
vector< std::string > | listDirectFeatureKeys () |
Gives all feature keys of the database that can be directly accessed. | |
vector< std::string > | listAllFeatureKeys () |
Gives all feature keys of the database. | |
void * | prepareGetAnyFeature (int seqrank, const std::string &featurekey) throw (std::string) |
Starts extraction of all features of a specified key present in the feature table of a database sequence. | |
Sequence * | getNextFeature (void *opaque) |
Successively returns features specified in a previous prepareGetAnyFeature() call. | |
void | interruptGetAnyFeature (void *opaque) |
Terminates a features extraction session initiated by a prepareGetAnyFeature() call before getNextFeature() call returned NULL. | |
Browsing database species and keywords. | |
RaaSpeciesTree * | loadSpeciesTree (bool showprogress=true) |
Loads the database's full species tree classification. | |
void | freeSpeciesTree (RaaSpeciesTree *tree) |
Frees the memory occupied by the species tree classification. | |
int | keywordPattern (const std::string &pattern) |
Initializes pattern-matching in database keywords. Matching keywords are then returned by successive nextMatchingKeyword() calls. | |
int | nextMatchingKeyword (std::string &matching) |
Finds next matching keyword in database. | |
Protected Attributes | |
raa_db_access * | raa_data |
Friends | |
class | RaaList |
This class provides network access to several nucleotide and protein sequence databases structured for multi-criteria retrieval under the ACNUC system as described in Remote access to ACNUC nucleotide and protein sequence databases at PBIL.
The list of available databases is here. EMBL and GenBank are daily updated; SwissProt (it is in fact UniProt and includes SwissProt and trEMBL) is updated at each partial release; EMBLwgs is updated at each full release (that is, quarterly).
Access can be done to single sequences from their name or accession number or to lists of sequences matching a query combining several retrieval criteria. Any fragment of any sequence defined by coordinates or by feature table entries can be retrieved. Access to sequence annotations is possible. Concurrent access to several databases is possible.
Access is possible to database entries and also to subsequences, i.e., one or more fragments of one or more parent sequences defined by a feature table entry. Subsequences are named by adding an extension (e.g., .PE1) to the name of their parent sequence.
bpp::RAA::RAA | ( | const std::string & | dbname, | |
int | port = 5558 , |
|||
const std::string & | server = "pbil.univ-lyon1.fr" | |||
) | throw (int) |
Direct constructor: opens a network connection to a database.
dbname | The database name (e.g., "embl", "genbank", "swissprot"). | |
port | The IP port number of the server (the default value is a safe choice; make sure that no firewall blocks outbound connections on this port). | |
server | The IP name of the server (the default value is a safe choice). |
int | An error code as follows: 1: incorrect server name 2: cannot create connection with server 3: unknown database name 4: database is currently not available for remote connection 7: not enough memory |
bpp::RAA::RAA | ( | int | port = 5558 , |
|
const std::string & | server = "pbil.univ-lyon1.fr" | |||
) | throw (int) |
Direct constructor: opens a network connection to a database server, without specifying a database.
Typical usage is to ask with knownDatabases() for the list of served databases, and then to open the chosen database with openDatabase().
port | The IP port number of the server (the default value is a safe choice; make sure that no firewall blocks outbound connections on this port). | |
server | The IP name of the server (the default value is a safe choice). |
int | An error code as follows: 1: incorrect server name 2: cannot create connection with server 7: not enough memory |
void RAA::closeDatabase | ( | ) |
Closes a database connection.
Allows to later open another database with openDatabase() using the same RAA object.
RaaList* bpp::RAA::createEmptyList | ( | const std::string & | listname, | |
const std::string & | kind = RaaList::LIST_SEQUENCES | |||
) | throw (int) |
Creates an empty list with specified name.
listname | A name to be given to the resulting list. Case is not significant. | |
kind | Nature of the resulting list. One of RaaList::LIST_SEQUENCES, RaaList::LIST_KEYWORDS, RaaList::LIST_SPECIES. |
int | 3: a list with same name already existed; it is left unchanged. 4: the server cannot create more lists. |
void RAA::deleteList | ( | RaaList * | list | ) |
void RAA::freeSpeciesTree | ( | RaaSpeciesTree * | tree | ) |
Frees the memory occupied by the species tree classification.
tree | An object previously returned by a loadSpeciesTree() call. It is deleted upon return. |
string RAA::getAnnotLineAtAddress | ( | RaaAddress | address | ) |
Returns the annotation line at the given address.
address | Information identifying the position of an annotation line typically obtained from a previous call to getCurrentAnnotAddress(). |
RaaSeqAttributes * RAA::getAttributes | ( | int | seqrank | ) |
Returns several attributes of a sequence from its database rank.
seqrank | The database rank of a sequence. |
RaaSeqAttributes* bpp::RAA::getAttributes | ( | const std::string & | name_or_accno | ) |
Returns several attributes of a sequence from its name or accession number.
name_or_accno | A sequence name or accession number. Case is not significant. |
RaaAddress RAA::getCurrentAnnotAddress | ( | ) |
Returns information identifying the position of the last read annotation line.
RaaList* bpp::RAA::getDirectFeature | ( | const std::string & | seqname, | |
const std::string & | featurekey, | |||
const std::string & | listname, | |||
const std::string & | matching = "" | |||
) |
Computes the list of subsequences of a given sequence corresponding to a given feature key with optional annotation string matching.
This function allows to retrieve all features of the given sequence corresponding to a given feature key and whose annotation optionally contains a given string.
Example:
getDirectFeature("AE005174", "tRNA", "mytrnas", "anticodon: TTG")
retrieves all tRNA features present in the feature table of sequence AE005174 that contain the string "anticodon: TTG" in their annotations, and puts that in a sequence list called "mytrnas". This function is meaningful with nucleotide sequence databases only (not with protein databases).
seqname | The name of a database sequence. Case is not significant. | |
featurekey | A feature key (e.g., CDS, tRNA, ncRNA) that must be directly accessible, that is, one of those returned by listDirectFeatureKeys(). Case is not significant. | |
listname | The name to give to the resulting sequence list. | |
matching | An optional string required to be present in the feature's annotations. Case is not significant. |
string RAA::getFirstAnnotLine | ( | int | seqrank | ) |
Returns the first annotation line of the sequence of given database rank.
seqrank | Database rank of a sequence. |
string RAA::getNextAnnotLine | ( | ) |
Returns the next annotation line after that previously read, or NULL if the end of the database file was reached.
Sequence * RAA::getNextFeature | ( | void * | opaque | ) |
Successively returns features specified in a previous prepareGetAnyFeature() call.
This function must be called repetitively until it returns NULL or until function interruptGetAnyFeature() is called. Features are processed in their order of appearance in the feature table.
opaque | A pointer returned by a previous prepareGetAnyFeature() call. |
Sequence * RAA::getSeq | ( | int | seqrank, | |
int | maxlength = 100000 | |||
) |
Returns a sequence identified by its database rank.
Because nucleotide database sequences can be several megabases in length, the maxlength argument avoids unexpected huge sequence downloads.
seqrank | The database rank of a sequence. | |
maxlength | The maximum sequence length beyond which the function returns NULL. |
Sequence* bpp::RAA::getSeq | ( | const std::string & | name_or_accno, | |
int | maxlength = 100000 | |||
) |
Returns a database sequence identified by name or accession number.
Because nucleotide database sequences can be several megabases in length, the maxlength argument avoids unexpected huge sequence downloads.
name_or_accno | A sequence name or accession number. Case is not significant. | |
maxlength | The maximum sequence length beyond which the function returns NULL. |
int bpp::RAA::getSeqFrag | ( | const std::string & | name_or_accno, | |
int | first, | |||
int | length, | |||
std::string & | sequence | |||
) |
Returns any part of a sequence identified by its name or accession number.
name_or_accno | The name or accession number of a sequence. Case is not significant. | |
first | The first desired position within the sequence (1 is the smallest valid value). | |
length | The desired number of residues (can be larger than what exists in the sequence). | |
sequence | Filled upon return with requested sequence data. |
int bpp::RAA::getSeqFrag | ( | int | seqrank, | |
int | first, | |||
int | length, | |||
std::string & | sequence | |||
) |
Returns any part of a sequence identified by its database rank.
seqrank | The database rank of a sequence. | |
first | The first desired position within the sequence (1 is the smallest valid value). | |
length | The desired number of residues (can be larger than what exists in the sequence). | |
sequence | Filled upon return with requested sequence data. |
void RAA::interruptGetAnyFeature | ( | void * | opaque | ) |
Terminates a features extraction session initiated by a prepareGetAnyFeature() call before getNextFeature() call returned NULL.
opaque | A pointer returned by a previous prepareGetAnyFeature() call. |
int bpp::RAA::keywordPattern | ( | const std::string & | pattern | ) |
Initializes pattern-matching in database keywords. Matching keywords are then returned by successive nextMatchingKeyword() calls.
pattern | A pattern-matching string using @ as wildcard (example: RNA@polymerase@). Case is not significant. |
int bpp::RAA::knownDatabases | ( | vector< std::string > & | name, | |
vector< std::string > & | description | |||
) |
Computes the list of names and descriptions of databases served by the server.
Typically used after creation of an RAA object without database and before openDatabase() call.
name | Vector of database names. Any of these names can be used in openDatabase() calls. | |
description | Vector of database descriptions. A description can begin with "(offline)" to mean the database is currently not available. |
vector< string > RAA::listAllFeatureKeys | ( | ) |
Gives all feature keys of the database.
These feature keys (e.g., CDS, conflict, misc_feature) can be used with function prepareGetAnyFeature(). This function is meaningful with nucleotide sequence databases only (not with protein databases).
vector< string > RAA::listDirectFeatureKeys | ( | ) |
Gives all feature keys of the database that can be directly accessed.
These feature keys (e.g., CDS, rRNA, tRNA) can be used with function getDirectFeature(). This function is meaningful with nucleotide sequence databases only (not with protein databases).
RaaSpeciesTree * RAA::loadSpeciesTree | ( | bool | showprogress = true |
) |
Loads the database's full species tree classification.
This call takes a few seconds to run on large databases because much data get downloaded from the server.
showprogress | If true, progress information gets sent to stdout. |
int bpp::RAA::nextMatchingKeyword | ( | std::string & | matching | ) |
Finds next matching keyword in database.
matching | Set to the next matching keyword upon return. |
int bpp::RAA::openDatabase | ( | const std::string & | dbname, | |
char *(*)(void *) | getpasswordf = NULL , |
|||
void * | p = NULL | |||
) |
Opens a database from its name.
dbname | The database name (e.g., "embl", "genbank", "swissprot"). | |
getpasswordf | NULL, or, for a password-protected database, pointer to a password-providing function that returns the password as a writable static char string. | |
p | NULL, or pointer to data transmitted as argument of getpasswordf. |
void* bpp::RAA::prepareGetAnyFeature | ( | int | seqrank, | |
const std::string & | featurekey | |||
) | throw (std::string) |
Starts extraction of all features of a specified key present in the feature table of a database sequence.
A database sequence can contain many instances of a given feature key in its feature table. Thus, feature extraction is done by first preparing the desired feature extraction, and by then successively extracting features by getNextFeature() calls until no more exist in the feature table or until a call to interruptGetAnyFeature() is done. Any successful prepareGetAnyFeature() call must be followed by getNextFeature() calls until it returns NULL or by a call to interruptGetAnyFeature(); any call to other RAA member functions in between is prohibited.
seqrank | The database rank of a sequence. | |
featurekey | Any feature key (direct or not) defined in EMBL/GenBank/DDBJ feature tables. These are also returned by listAllFeatureKeys(). |
string | A message indicating the cause of the error. |
RaaList* bpp::RAA::processQuery | ( | const std::string & | query, | |
const std::string & | listname | |||
) | throw (std::string) |
Returns the list of database elements (often sequences) matching a query.
Query examples:
k=ribosomal protein L14
sp=felis catus and t=cds
query | A retrieval query following the syntax described here. | |
listname | A name to be given to the resulting list. Case is not significant. If a list with same name already exists, it is replaced by the new list. |
string | If error, the string is a message describing the error cause. |
Sequence* bpp::RAA::translateCDS | ( | const std::string & | name | ) | throw (BadCharException) |
Returns the full protein translation of a protein-coding nucleotide database (sub)sequence.
name | The name of a protein-coding sequence. It can be either a subsequence corresponding to a CDS feature table entry, or a sequence if all of it belongs to the CDS. |
BadCharException | In rare cases, the CDS may contain an internal stop codon that raises an exception when translated to protein. |
Sequence * RAA::translateCDS | ( | int | seqrank | ) | throw (BadCharException) |
Returns the full protein translation of a protein-coding nucleotide database (sub)sequence.
seqrank | The database rank of a protein-coding sequence. It can be either a subsequence corresponding to a CDS feature table entry, or a sequence if all of it belongs to the CDS. |
BadCharException | In rare cases, the CDS may contain an internal stop codon that raises an exception when translated to protein. |
char RAA::translateInitCodon | ( | int | seqrank | ) |
Returns the amino acid translation of the first codon of a protein-coding (sub)sequence.
seqrank | The database rank of a protein-coding sequence. It can be either a subsequence corresponding to a CDS feature table entry, or a sequence if all of it belongs to the CDS. |