Previous Up Next

class Compte

module compte

The documentation is here.

This class is used to compute counts on word-occurences, particularly inside sequences.

The special character ^ is used to represent beginnings and ends of sequences. Such a character is handy to build markovian models from a Compte (see Proportion).

Construction

__init__
Optional keyword fic allows construction by reading from a filename in specific format;
add_seq
adds to the count the words of a specified length that are in a sequence of letters. This sequence must have the operator __getitem__.

Optional keywords:

deb=d
counts only after position d (>=0) included;
fin=f
counts only before position f (<len()) included;
alpha=a
counts only words which letters are in string a. If a=”, does not consider alpha;
fact=f
each word counts f (default: 1).
add_pseudo
adds to the count the given word with optional count (default value: 1).
read_nf
builds from a filename in specific format;

Handling

__getitem__
returns the count of the specified word. Character . is a wildcard for all letters, ^ excepted.

For example:

>>> import compte >>> c=compte.Compte() >>> c.add_seq("ABCBA",3) >>> c.add_seq("BCBAA",3) >>> c.add_seq("BABAB",3) >>> print c AA^ 1 ABA 1 ABC 1 AB^ 1 A^^ 2 CBA 2 BAA 1 BAB 2 BA^ 1 BCB 2 B^^ 1 ^AB 1 ^BA 1 ^BC 1 >>> c['A'] 6 >>> c['A.'] # number of 'A's followed by a letter 4 >>> c['BAB'] 2 >>> c['BCB'] 2 >>> c['B.B'] 4 >>> c['B^'] # number of ending 'B's 1 ########################
__iadd__
adds to self the counts of a Compte;
copy
returns a new Compte which is the copy of self;
min
returns a new Compte made of the minimum of the counts of self and another Compte;
max
returns a new Compte made of the maximum of the counts of self and another Compte;
intersects
returns a new Compte made of the counts of self that are counts of another Compte;
__idiv__
divides all of the counts by a specified value;
restrict_to
returns a new Compte which is self restricted to the letters of the specified string. The end-character ^ is kept;
strip
returns a new Compte from the words of self that do not have the end-character ^;
rstrip
returns a new Compte from the words of self that do not end with character ^;
lg_max
returns the length of the longest word;
alph
returns the list of the letters used in the counts;
words
returns the list of the counted words;
prop
returns the corresponding Proportion.

Optional keywords:

lpost=l
specifies the length of the posterior words, ie the words which frequencies are computed. Default: the maximum length of the words;
lprior=l
specifies the length of the prior words, ie the words on which the computed words depend, in a markovian context. Default: 0;

As special character "^" stands for the limits of the sequence, the words terminating with this symbol are counted as " same length or longer than given length "-words;

next
returns a list of [letter,count] of letters following the specified word;
has_prefix
returns True if the specified word is a strict prefix of a word in the Compte;
is_empty
returns True if the Compte is empty.

Input-Output

Specific format is:

description
lines of
     word and count separated by a space or a tabulation
 
example
AB     3
B^      5
^BBC  1
pref
returns the string of the Compte of words of specified length in same format as __str__;
__str__
outputs in specific format.

Previous Up Next