ACNUC remote access protocol


Sorted list of functions : acnucopen, acnucclose, alllistranks, bcount, bit1, bit0, btest, clientid, copylist, countfreelists, countsubseqs, crelistfromclientdata, fcode, extractseqs, getannots, getattributes, getemptylist, getliststate, getlistrank, gfrag, ghelp, iknum, isenum, knowndbs, loadtaxonomy, modifylist, next_annots, nexteltinlist, nextmatchkey, prep_getannots, prep_requete, prettyseq, proc_query, proc_requete, quit, read_annots, readacc, readaut, readbib, readext, readfirstrec, readkey, readlng, readloc, readshrt, readsmj, readspec, readsub, releaselist, residuecount, savelist, selseqs1node, seq_to_annots, setlistname, setliststate, zerolist, zlibloadtaxonomy
Functions by topic
start query the database get annotations &
manage lists special purpose low level
knowndbs proc_query isenum, getattributes alllistranks, nexteltinlist extractseqs readacc, readaut, readbib
acnucopen modifylist read_annots bcount, bit1, bit0, btest (zlib)loadtaxonomy readkey, readlng, readext
acnucclose fcode next_annots getemptylist, getlistrank nextmatchkey readloc, readshrt
clientid iknum gfrag zerolist, releaselist, countfreelists,
setlistname, setliststate
readsmj, readspec,
readsub, readfirstrec
selseqs1node prettyseq getliststate, copylist ghelp, residuecount bit1, bit0, btest, bcount
quit seq_to_annots savelist, crelistfromclientdata countsubseqs

Remote access to acnuc databases works by opening a socket on port # 5558 of and by communicating on this socket following the protocol described here.

client opens a socket to server, typically port 5558 of
client receives: OK acnuc socket started\n on socket from server (\n indicates end-of-line)
client sends to server on socket: clientid&id="client_name"\n and receives code=0\n
client sends to server on socket: acnucopen&db=embl\n
client receives from server on socket: code=0&type=EMBL&totseqs=31722973&totspecs=224976&totkeys=1148875\n
client sends to server on socket: gfrag&name=J01714&start=1&length=50\n
client receives from server on socket: length=50&aacctttccggtcgcggagataaagacatcttcaccgttcacgatatttt\n
send / receive pairs are repeated
client sends to server on socket: quit\n, receives OK acnuc socket closed\n and closes the socket.

==> command&args <newline>      command + arguments + end-of-line sent to server
<== string <newline>      one or more lines of text received from server
[ arg1 | arg2 ] alternative arguments;
{ arg } optional argument
All arguments (e.g. name="Homo sapiens") can be bracketed by double quotes when useful, but should have internal " escaped with \ (\"); this escape rule is expected from client and is applied by server.

Status codes
The status of the reply to a command is generally returned as
<== code=stat_number{ &optional arguments }
Absence of "code=stat_number" in the reply to a command implies success.
stat_number values are


==> acnucopen&db=xxxxx <== code=2 missing db= argument code=3 if no database with that name is known by the server code=4 if database is currently unavailable code=5 if a database is currently opened and has not been closed code=6&challenge=xx if the database requires password authorization, server sends a challenge to client. ==> reply=xx authorization data sent by client to server that must be the MD5 digest of the string "challenge:dbname:md5-pw" where md5-pw is the MD5 digest of the password. <== code=6 when authorization failed code=0&type=[GENBANK|EMBL|SWISSPROT|NBRF]&totseqs=xx&totspecs=xx&totkeys=xx &ACC_LENGTH=xx&L_MNEMO=xx&WIDTH_KW=xx&WIDTH_SP=xx&WIDTH_SMJ=xx&WIDTH_AUT=xx&WIDTH_BIB=xx&lrtxt=xx&SUBINLNG=xx Initiates remote access to an acnuc database. The db= argument identifies the target database by a logical name that can be any dbname returned by the knowndbs command, or taken from the 1st column of this table, or the name of a database requiring password authorization. type : the type of database that was opened. totseqs, totspec, totkey : total number of seqs, species, keywords in opened database. ACC_LENGTH, L_MNEMO, WIDTH_KW, WIDTH_SP, WIDTH_SM, WIDTH_AUT, WIDTH_BIB, lrtxt, SUBINLNG: max lengths of record keys in database

==> clientid&id="xxxxx" <== code=0 Sends the server an identification of the client, typically a program name.

==> acnucclose <== code=xx To close the currently opened acnuc db. code : 0 if OK 3 if no database was opened by the server

==> quit <== OK acnuc socket closed To close the socket and stop communication over it.

==> gfrag&[number=xx|name=xx]&start=xx&length=xx <== length=xx&....sequence... Get length characters from sequence identified by name or by number starting from position start (counted from 1). Reply gives the length read (may be shorter than asked for) and then the characters; length can be 0 if any error.

==> read_annots&[number=xx|name=xx|offset=xx&div=xx]{&nl=xx} <== nl=xx&...1 or several lines... Reads nl (1 by default) consecutive lines of annotations identified by offset and div(ision) or by seq number or by seq name. Reading of lines stops when nl lines have been transmitted or at the last annotation line of the (sub-)sequence (SQ or ORIGIN line; end of feature table entry for a subsequence). Reply gives the number of lines sent and then these lines

==> next_annots{&nl=xx} <== nl=xx&offset=xx&...1 or several lines... Reads nl (1 by default) consecutive lines of annotations following the previously read annotation lines. Reading of lines stops when nl lines have been transmitted or at the last annotation line of the sequence (SQ or ORIGIN line). Reply gives the number and data read and the offset of the first line read.

==> seq_to_annots&[number=xx|name=xx] <== code=xx&offset=xx&div=xx Returns the information useful for reading annotations (offset + div) for a sequence identified by name or by number. Reply has code != O iff error.

==> countfreelists <== code=xx&free=xx&annotlines="xx" Returns the number of free lists available. code: 0 iff OK free: number of free lists available annotlines: list of names of annotation lines in the opened database separated by |

==> prep_requete This is another, equivalent name for the countfreelists function.

==> proc_query&query="......."&name="xx" in case of error : <== code=xx&message="xx" in case of success : <== code=0&lrank=xx&count=xx&type=[SQ|KW|SP]{&locus=[T|F]} Processes an acnuc query and puts result in list with specified name, overwriting the list if one with same name already exists. The query must follow the ACNUC query language. Reply gives code = 0 if OK and then gives lrank, the rank of the resulting list, the count of elements in list, the type of the list (SQ, sequences; SP, species; KW, keywords), and for sequence lists, whether the list contains only parent sequences (locus=T). In case of error, code is != 0 and message is a text describing error.

==> proc_requete&query="......."&name="xx" This is another, equivalent name for the proc_query function.

==> nexteltinlist&lrank=xx&first=xx{&count=xx} <== next=xx&name="xx"{&length=xx&offset=xx&div=xx&frame=xx&ncbigc=xx} finds the next element(s) in list identified by rank (lrank argument) after the elt given in argument first (possibly several elements on successive lines if &count=xx is used; set first=1 to start running through list). Reply returns in next the element value, or 0 if no more elements exist; it gives also the name of this element that can be a sequence, a species or a keyword, and, for a sequence, its length, division and offset for annotations, reading frame and ncbi genetic code id. If &count=xx is used, at most count lines are returned (next=0 indicates < count lines); if not used, exactly one line is returned.

==> getliststate&lrank=xx <== code=xx&type=[SQ|KW|SP]&name="xx"&count=xx{&locus=[T|F]} Asks for information about the list of specified rank. Reply gives the type of list, its name, the number of elements it contains, and, for sequence lists, says whether the list contains only parent seqs (locus=T). Reply gives code != 0 if error.

==> setliststate&lrank=xx{&locus=[T|F]}{&type=[SQ|KW|SP]} <== code=xx To set the type and/or the "locus value" of the specified list. Reply gives code != 0 if error.

==> getlistrank&name="xx" <== lrank=xx Returns the rank of list, or 0 if no list with name exists.

==> setlistname&lrank=xx&name="xx" <== code=xx Sets the name of a list identified by its rank. Returned code : 0 if OK, 3 if another list with that name already existed and was deleted 4 no list of rank exists

==> getemptylist&name="xx" <== code=xx&lrank=xx Creates a new, empty list, sets its name, and returns its rank and a status code. code : 0 OK 3 if name is already used for another list with given rank (no change done) 4 no empty list exists (no lrank value returned)

==> releaselist&lrank=xx <== code=xx Release resources associated to list of specified rank which does not exist anymore. Code != 0 indicates error.

==> residuecount&lrank=xx <== code=xx&count=xx Computes the total number of residues (nucleotides or aminoacids) in all sequences of the list of specified rank. Code != 0 indicates error.

==> selseqs1node&num=xx&kind=[SP|HO|KW] <== lrank=xx{&count=xx} Creates a list of seqs attached to species (if kind=SP), host (HO) or keyword (KW) of number num. Returns rank of created list (0 if error) and count of seqs therein.

==> alllistranks <== count=xx&n1,n2,... Returns the count of existing lists and all their ranks separated by commas.

==> bcount&lrank=xx <== code=xx&count=xx Counts the number of elements in list given by lrank. Code != 0 indicates error.

==> bit1&lrank=xx&num=xx <== code=xx Adds element num to list of rank lrank. Code != 0 indicates error.

==> bit0&lrank=xx&num=xx <== code=xx Removes element num from list of rank lrank. Code != 0 indicates error.

==> btest&lrank=xx&num=xx <== code=0&[on|off] Tests for presence of element num in list of rank lrank: on means present, off means absent. Code != 0 indicates error.

==> copylist&lfrom=xx&lto=xx <== code=xx Copies list of rank lfrom to list of rank lto that must have been previously allocated by e.g., getemptylist or proc_query. Code != 0 indicates error.

==> zerolist&lrank=xx <== code=xx Empties the list of specified rank that must have been previously allocated by e.g., getemptylist or proc_query. Code != 0 indicates error.

==> countsubseqs&lrank=xx <== code=xx&count=xx Returns the number of subsequences in list or rank lrank. Code != 0 indicates error.

==> isenum&[name=xx|access=xx] <== number=xx{&length=xx&frame=xx&gencode=xx&ncbigc=xx}{&otheraccessmatches} Finds the acnuc number of a sequence from its name (name= argument) or its accession number (access= argument). The name= and access= arguments are case-insensitive. Reply gives number (or 0 if does not exist), length, reading frame (0, 1, or 2), and genetic code ids of the corresponding sequence (gencode= gives acnuc's genetic code, 0 means universal; ncbigc= gives ncbi's genetic code id, 1 means universal). When &otheraccessmatches appears in reply, it means that several sequences are attached to the given accession no., and that only the acnuc number of the first attached sequence is given in the number= argument.

==> iknum&name="xx"&type=[SP|KW] <== num=xx Finds the acnuc number of a species (type=SP) or a keyword (type=KW). Returns 0 if does not exist.

==> fcode&name="xx"&type=[AUT|BIB|ACC|SMJ|SUB] <== num=xx Finds the acnuc number of an author (type=AUT) a reference (BIB) an accession number (ACC) a record of the SMJYT index file (SMJ) or a sequence (type=SUB). Returns 0 if does not exist.

==> readsub&num=xx <== code=xx&name="xx"&length=xx&type=xx&is_sub=xx&toext=xx&plkey=xx&frame=xx&genet=xx&ncbigc=xx Returns data for sequence of number num: name, length, the number of the sequence type; the returned value of is_sub is 0 for a subsequence, the rank of the corresponding LOCUS record for a parent seq; toext is the (positive) value of the pext field. plkey is the start of the short list of attached keywords. frame is the reading frame (0, 1, or 2; meaningful only for CDSs). genet is the acnuc genetic code id (0 means universal genetic code). ncbigc is the ncbi genetic code id (1 means universal genetic code). Code != 0 indicates error.

==> readloc&num=xx <== code=xx&sub=xx&pnuc=xx&pinf=xx&spec=xx&host=xx&plref=xx&molec=xx&placc=xx&org=xx&date=xx Returns data from the record of rank num of file LOCUS. Code != 0 indicates error.

==> readspec&num=xx <== code=xx&name="xx"&plsub=xx&desc=xx&syno=xx&host=xx{&libel="xx"} Returns data from the record of rank num of file SPECIES including label if not empty. Code != 0 indicates error.

==> readkey&num=xx <== code=xx&name="xx"&plsub=xx&desc=xx&syno=xx{&libel="xx"} Returns data from the record of rank num of file KEYWORDS including label if not empty. Code != 0 indicates error.

==> readsmj&num=xx&nl=xx <== code=xx&nl=xx recnum=xx&name="xx"&plong=xx{&libel="xx"} ... a series of nl lines like that ... Returns data from nl consecutive records starting from rank num of file SMJYT including label if not empty. Code != 0 indicates error.

==> readext&num=xx <== code=xx&mere=xx&debut=xx&fin=xx&next=xx Returns data from the record of rank num of file EXTRACT. Code != 0 indicates error.

==> readlng&num=xx <== code=xx&n=xx&xx,xx,...{&next=xx} Reads part of a long list starting at record number num. Returns the number of read elements, then these elements separated by commas, then, if the list is not finished, information for the next part of the chain ; the list may not be fully read, but the next value gives information to pursue reading. Code != 0 indicates error.

==> readacc&num=xx <== code=xx&name="xx"&plsub=xx Returns data from the record of rank num of file ACCESS. Code != 0 indicates error.

==> readaut&num=xx <== code=xx&name="xx"&plref=xx Returns data from the record of rank num of file AUTHOR. Code != 0 indicates error.

==> readbib&num=xx <== code=xx&name="xx"&j=xx&y=xx{&jname="xx"}{&yname="xx"}&plsub=xx&plaut=xx Returns data from the record of rank num of file BIBLIO. Code != 0 indicates error.

==> readshrt&num=xx{&max=xx} <== code=xx&n=xx&xx,xx,... Returns up to max pairs (default 50) [val,next] of the short list starting at record number num, n says how many, then these pairs. Code != 0 indicates error.

==> readfirstrec&type=[AUT|BIB|ACC|SMJ|SUB|LOC|KEY|SPEC|SHRT|LNG|EXT|TXT] <== code=xx&count=xx Returns the record count of the specified ACNUC index file. Code != 0 indicates error.

==> ghelp&file=xx&item=xx <== nl=xx&...1 or several lines... Reads one item of information from specified help file. File can be HELP or HELP_WIN, item is the name of the desired help item Reply : nl is 0 if any problem, or announces the number of help lines returned.

==> nextmatchkey&num=xx{&pattern="xx"}{&count=xx} With the &count=xx argument: <== code=0&count=xx num=xx&name="xx" count such lines Without the &count=xx argument: <== code=xx&num=xx{&name="xx"} Pattern matching in index file KEYWORDS. Returns the number and name of the next count (one if &count= argument was not used) keywords matching pattern after given number; use first time with num=2 and giving a pattern; then call without specifying pattern, until returns num=0. A pattern is a character string where @ matches any string (e.g. @polymerase@). Error code: 3: Not enough memory.

==> loadtaxonomy or zlibloadtaxonomy <== code=0&total=xx\n rank&parent&count&""{&"...label..."}\n one such line for each taxon loadtaxonomy END.\n Sends to client the complete sequence taxonomy of the ACNUC database, compressed using zlib if zlibloadtaxonomy command is used. This command can be interrupted by client sending the escape ASCII character to server on socket; client should keep reading socket until "loadtaxonomy END.\n" arrives. total is (slightly) larger than the number of lines that follow rank: the rank of a taxon (the root has rank 2) parent: the rank of its parent (the root's parent is 0, arbitrarily) synonyms are indicated by a < 0 value of parent count: the number of seqs directly attached to this taxon name: the taxon name label: optionally, a taxon label

==> crelistfromclientdata{&type=[SQ|AC|SP|KW]}&nl=xx 0 or more lines of data sent by client to server <== code=0&name="xx"&lrank=xx&count=xx\n To create on server a bitlist from data lines sent by client. Each such line contains either a sequence name, an accession number, a taxon name, or a keyword. type: the type of data sent to server (SQ=seqs, AC=acc nos, SP=species, KW=keywords) SQ by default nl: announces the number of data lines that follow (0 is OK) code: 0 iff OK 3 no list creation is possible 4 EOF while reading the nl lines from client name: name of bitlist created from this data lrank: rank of this bitlist count: count of elements in bitlist

==> savelist&lrank=xx{&type=[N|A]} <== code=0\n list element names or acc nos on successive lines savelist END.\n To obtain names of all elements of a bit list sent on socket on successive lines; for sequence lists, option &type=A, will give accession numbers instead of seq names; end of series of lines is when savelist END.\n appears lrank : rank of bitlist type: A gives accession numbers, N (default) gives seq names; useful for seq lists only

==> modifylist&lrank=..&type=[length|date|scan]&operation=".." <== code=0&lrank=..&name=".."&count=..{&processed=..} code=3 if impossible to create a new list code=2 if incorrect syntax, possibly in operation lrank: (input) rank of bitlist to be modified (output) rank of created bitlist containing result of modify operation type: indicates what kind of modification is to be performed. operation: for length, as in "> 10000" or "< 500" for date, as in "> 1/jul/2001" or "< 30/AUG/98" for scan, specify the string to be searched for prep_getannots must be used before using modifylist&type=scan the client can interrupt the scan operation by sending the escape character on the socket name: name of created bitlist count: number of elements in created bitlist processed: only for scan operation, number of list elements scanned until completion or interruption

==> knowndbs{&tag=xx} <== nl=.. \n dbname | on/off | db description \n nl such lines Returns, for each database known by the server, its name (a valid value for the db= argument of the acnucopen command), availability (off means temporarily unavailable), and description. When the optional tag= argument is used, only databases tagged with the given string are listed; without this argument, only untagged databases are listed. The tag argument thus allows to identify series of special purpose (tagged) databases, in addition to default (untagged) ones. The full list of untagged and tagged databases is here.

==> prep_getannots&nl=xx key_name{|subkey_name} \n nl such lines sent to server .... \n <== code=0 \n This command must be used before using the getannots or the modifylist&type=scan commands to specify what sorts of annotation records will be returned by the getannots command or will be scanned. nl: announces the number of key names that follow. key_name: an annotation key name. subkey_name: optionally, an annotation sub-item name (e.g., CDS when key_name = FT) For the EMBL/SWISSPROT format, keys are: ALL, AC, PR, DT, GN, KW, OS, OC, OG, OX, OH, RN, RC, RP, RX, RA, RG, RT, RL, DR, AH, AS, CC, PE, FH, FT, CO, SQ, SEQ. For GenBank: ALL, ACCESSION, VERSION, PROJECT, KEYWORDS, SOURCE, ORGANISM, REFERENCE, AUTHORS, CONSRTM, TITLE, JOURNAL, PUBMED, REMARK, COMMENT, FEATURES, ORIGIN, SEQUENCE. For FT(embl,swissprot) and FEATURES(GenBank), one or more specific feature keys can be specified using lines with only uppercase and such as FEATURES|CDS FT|TRNA Keys ALL and SEQ/SEQUENCE stand for all annotation and sequence lines, respectively. For the scan operation, key ALL stand for the DE/DEFINITION lines, and SEQ/SEQUENCE cannot be used (annotations but not sequence are scanned).

==> getannots&number=xx <== code=0 \n plain text line ... (a series of consecutive lines) \\\ (a line with exactly three \ announces the end of the line series) To get the annotations of sequence of rank number; use prep_getannots before to specify what types of annotation lines are desired. Annotation lines from ID+DE (EMBL/SWISSPROT) or LOCUS+DEFINITION (GenBank) are always transferred. For a subsequence, the feature entry is always transferred.

==> prettyseq&num=xx{&bpl=xx}{&translate=[T|F]} <== code=0\n line1 line2 ... prettyseq END.\n To get a text representation of sequence of rank num and of its subsequences, with bpl bases per line (default = 60), and with optional translation of protein-coding subsequences.

==> extractseqs&[lrank=xx|seqnum=xx]&format=xx&operation=xx {&feature="xx"}{&bounds="xx"}{&minbounds="xx"}{&zlib=[T|F]}\n command output for format != coordinates <== code=xx{&message="xx"}\n line1 \n line2 \n ... \n <esc>count=xx\n (one such line at the end of output related to each member of list) ... all of this for each sequence of the list if lrank=xx was used ... extractseqs END.\n command output for format=coordinates <== code=xx\n rank=xx&start=xx&end=xx| ... \n (rank=parent seq rank in DB; start,end=coordinates in this seq) ... for each series of coordinates ... extractseqs END.\n To extract a list of sequences (lrank argument) or a single sequence (seqnum argument) using different output formats and types of extraction. All formats except "coordinates" extract sequence data. Format "coordinates" extract coordinate data; start > end indicates the complementary strand.

This command can be interrupted by client sending the escape ASCII character to server on socket; client should keep reading socket until "extractseqs END.\n" arrives.

==> getattributes&[id=xx|rank=xx]{&seq=[T|F]} <== code=0&rank=xx&name=xx&length=xx{&fr=xx&gc=xx}&acc=xx&descr="xx"&spec="xx"\n {seq=xxxxxx\n} From a sequence name or an accession number (id= argument), or from a sequence rank (rank= argument), returns its rank, name, length, reading frame (0, 1, 2), genetic code (acnuc's), primary accession number, first DE/DEFINITION line, and species name. If argument seq=T is given also returns, on a second line, the full sequence. Reading frame and genetic code are not returned for SwissProt.