class Proportion
module compte
The documentation is here.
This class is used for proportions of words that follow given words.
For example, it can store the fact that the proportions of letters
following word AC are:
A  0.34 
C  0.15 
G  0.23 
T  0.28 
Then a distinction is made between prior words (such as
AC here), and posterior words (such as
A,C,G and T here).
The special character ^
is used to represent beginnings and
ends of sequences.
Construction

__init__

Optional keywords:

fic
 allows construction by
reading from a filename in specific
format;
 str
 allows construction by
reading from a string in specific
format;
 read_nf
 builds from a filename
in specific format;
 read_str
 builds from a string
in specific format;
 read_Compte
 Builds from a
Compte.
Optional keywords:

lpost=l
 specifies the length of the
posterior words, ie the words which frequencies are computed.
Default: the maximum length of the words;
 lprior=l
 specifies the length of the prior
words, ie the words on which the computed words depend, in a
markovian context. Default: 0;
As special character "^"
stands for the limits of the
sequence, the words terminating with this symbol are counted as
" same length or longer than given length "words;
Handling

__getitem__
 returns the string, in
format of
Compte of the posterior
corresponding to the given prior;
 limit_on
 returns a SHALLOW copy of the
Proportion limited to the priors that contain words of a
given list;
 __iadd__
 adds to self the proportions of
another Proportion;
 KL_MC
 computes KullbackLeibler distance to a
Proportion, by Monte Carlo simulation on several
(default:100) Sequence of a
given length (default:1000) generated by method
read_prop of
Sequence. See
Sequence generation;
 lg_max
 returns the length of the longest word, prior
or posterior (prior+posterior);
 lg_max_prior
 returns the length of the longest
prior;
 lg_max_posterior
 returns the length of the longest
posterior;
 alph
 returns the list of the letters used in the
proportions;
 prefixes
 returns the list of the prefixes;
 next
 returns a list of
[posterior,proportion] for the specified prior. The wild letter ’.’
takes for any the letters;
 rand_next
 returns a posterior given a spedified
prior, randomly chosen among following the proportions of the prior;
 has_prefix
 returns True if
the specified word is a valid prior. Remember that character
^
stands for begin or end of sequence;
 is_empty
 returns True if the Proportion is
empty;
 has_post
 returns True if the Proportion has
a posterior with an empty prior.
InputOutput
Specific format is:
description 

lines of 
priorposterior and count separated 
by a whitespace 





 example  

AB  0.3 
AA  0.7 
BB  0.5 
BA  0.5 
A  0.1 
B  0.9 
^ A  0.5 
^ B  0.5 

In this example, following an A, proportion of B is 0.3, and
proportion of A is 0.7. Overall proportion of A is 0.1, and of B is
0.9. Proportion of beginning A is 0.5, as well as proportion of
beginning B.

__str__
 outputs in specific format;
 loglex
 returns the corresponding
Descripteur. See
read_prop in that class;
Sequence generation
From a Proportion, a (part of a)
Sequence can be generated randomly,
by the method read_prop.
The process is:

for all increasing positions i:

get the longest word w ending in i1 that is a valid
prior (using method
has_prefix);
 if there is a posterior corresponding to w, let lp be the list
of corresponding couples [posterior,proportion] (using method
next); otherwise, lp is
the list of the uniform distribution of all letters of the
Proportion;
 randomly choose a posterior p according to the factors in list
lp (see under);
 if the first letter of p is a terminating character
(
^
), put character null (’\0’) at that position,
and exit;
otherwise put that letter at position i on the
Sequence.
Actually, the random choice of the posterior is made with
probabilities proportional to their respective proportions, even if
the sum of the proportions is different from 1. Then sequence
generation is possible even with nonorthodox proportions.