class Compte
module compte
The documentation is here.
This class is used to compute counts on word-occurences, particularly
inside sequences.
The special character ^
is used to represent beginnings and
ends of sequences. Such a character is handy to build markovian models
from a Compte (see
Proportion).
Construction
-
__init__
- Optional keyword fic allows
construction by reading from a filename
in specific format;
- add_seq
- adds to the
count the words of a specified length that are in a sequence of
letters. This sequence must have the operator
__getitem__.
Optional keywords:
-
deb=d
- counts only after position d (>=0) included;
- fin=f
- counts only before position f (<len())
included;
- alpha=a
- counts only words which letters are in
string a. If a=”, does not consider alpha;
- fact=f
- each word counts f (default: 1).
- add_pseudo
- adds to the count the given word with
optional count (default value: 1).
-
If an element of the list is a string, it adds this string with count
1.
- If an element of the list is a list
[s,c]
with s a
string and c a number, the word s in added with count c.
- read_nf
- builds from a filename
in specific format;
Handling
-
__getitem__
- returns the count of the specified
word. Character
.
is a wildcard for all letters, ^
excepted.
For example:
>>> import compte
>>> c=compte.Compte()
>>> c.add_seq("ABCBA",3)
>>> c.add_seq("BCBAA",3)
>>> c.add_seq("BABAB",3)
>>> print c
AA^ 1
ABA 1
ABC 1
AB^ 1
A^^ 2
CBA 2
BAA 1
BAB 2
BA^ 1
BCB 2
B^^ 1
^AB 1
^BA 1
^BC 1
>>> c['A']
6
>>> c['A.'] # number of 'A's followed by a letter
4
>>> c['BAB']
2
>>> c['BCB']
2
>>> c['B.B']
4
>>> c['B^'] # number of ending 'B's
1
########################
-
__iadd__
- adds to self the
counts of a Compte;
- copy
- returns a new Compte which is the copy
of self;
- min
- returns a new Compte made of the minimum
of the counts of self and another Compte;
- max
- returns a new Compte made of the maximum
of the counts of self and another Compte;
- intersects
- returns a new Compte made of the
counts of self that are counts of another Compte;
-
- __idiv__
- divides all of the counts by a specified
value;
- restrict_to
- returns a new Compte which is
self restricted to the letters of the specified string. The
end-character
^
is kept;
- strip
- returns a new Compte
from the words of self that do not have the end-character
^
;
- rstrip
- returns a new Compte
from the words of self that do not end with character
^
;
-
- lg_max
- returns the length of the longest word;
- alph
- returns the list of the letters used in the
counts;
- words
- returns the list of the counted words;
- prop
- returns the corresponding
Proportion.
Optional keywords:
-
lpost=l
- specifies the length of the
posterior words, ie the words which frequencies are computed.
Default: the maximum length of the words;
- lprior=l
- specifies the length of the prior
words, ie the words on which the computed words depend, in a
markovian context. Default: 0;
As special character "^"
stands for the limits of the
sequence, the words terminating with this symbol are counted as
" same length or longer than given length "-words;
- next
- returns a list of [letter,count] of letters
following the specified word;
- has_prefix
- returns True if the specified word is a
strict prefix of a word in the Compte;
- is_empty
- returns True if the Compte is empty.
Input-Output
Specific format is:
description |
|
lines of |
word and count separated by a space or a tabulation |
|
| |
-
pref
- returns the string of the
Compte of words of specified length in same format as
__str__;
- __str__
- outputs in specific format.