class Compte

class Compte

module compte

The documentation is here.

This class is used to compute counts on word-occurences, particularly inside sequences.

The special character ^ is used to represent beginnings and ends of sequences. Such a character is handy to build markovian models from a Compte (see Proportion).

Construction

__init__

Optional keyword fic allows construction by reading from a filename in specific format;

add_seq

adds to the count the words of a specified length that are in a sequence of letters. This sequence must have the operator __getitem__.

Optional keywords:

deb=d: counts only after position d (>=0) included;
fin=f: counts only before position f (<len()) included;
alpha=a: counts only words which letters are in string a. If a=”, does not consider alpha;
fact=f: each word counts f (default: 1).

add_pseudo

adds to the count the given word with optional count (default value: 1).

If an element of the list is a string, it adds this string with count 1.
If an element of the list is a list [s,c] with s a string and c a number, the word s in added with count c.

read_nf

builds from a filename in specific format;

Handling

__getitem__: returns the count of the specified word. Character . is a wildcard for all letters, ^ excepted.

For example:

>>> import compte >>> c=compte.Compte() >>> c.add_seq("ABCBA",3) >>> c.add_seq("BCBAA",3) >>> c.add_seq("BABAB",3) >>> print c AA^ 1 ABA 1 ABC 1 AB^ 1 A^^ 2 CBA 2 BAA 1 BAB 2 BA^ 1 BCB 2 B^^ 1 ^AB 1 ^BA 1 ^BC 1 >>> c['A'] 6 >>> c['A.'] # number of 'A's followed by a letter 4 >>> c['BAB'] 2 >>> c['BCB'] 2 >>> c['B.B'] 4 >>> c['B^'] # number of ending 'B's 1 ########################

__iadd__

adds to self the counts of a Compte;

copy

returns a new Compte which is the copy of self;

min

returns a new Compte made of the minimum of the counts of self and another Compte;

max

returns a new Compte made of the maximum of the counts of self and another Compte;

intersects

returns a new Compte made of the counts of self that are counts of another Compte;

__idiv__

divides all of the counts by a specified value;

restrict_to

returns a new Compte which is self restricted to the letters of the specified string. The end-character ^ is kept;

strip

returns a new Compte from the words of self that do not have the end-character ^;

rstrip

returns a new Compte from the words of self that do not end with character ^;

lg_max

returns the length of the longest word;

alph

returns the list of the letters used in the counts;

words

returns the list of the counted words;

prop

returns the corresponding Proportion.

Optional keywords:

lpost=l: specifies the length of the posterior words, ie the words which frequencies are computed. Default: the maximum length of the words;
lprior=l: specifies the length of the prior words, ie the words on which the computed words depend, in a markovian context. Default: 0;
As special character "^" stands for the limits of the sequence, the words terminating with this symbol are counted as " same length or longer than given length "-words;

next

returns a list of [letter,count] of letters following the specified word;

has_prefix

returns True if the specified word is a strict prefix of a word in the Compte;

is_empty

returns True if the Compte is empty.

Input-Output

Specific format is:

description

lines of

word and count separated by a space or a tabulation

example

AB 3

B^ 5

^BBC 1

pref: returns the string of the Compte of words of specified length in same format as __str__;
__str__: outputs in specific format.