class Proportion
module compte
The documentation is here.
This class is used for proportions of words that follow given words.
For example, it can store the fact that the proportions of letters
following word AC are:
A | 0.34 |
C | 0.15 |
G | 0.23 |
T | 0.28 |
Then a distinction is made between prior words (such as
AC here), and posterior words (such as
A,C,G and T here).
The special character ^
is used to represent beginnings and
ends of sequences.
Construction
-
__init__
-
Optional keywords:
-
fic
- allows construction by
reading from a filename in specific
format;
- str
- allows construction by
reading from a string in specific
format;
- read_nf
- builds from a filename
in specific format;
- read_str
- builds from a string
in specific format;
- read_Compte
- Builds from a
Compte.
Optional keywords:
-
lpost=l
- specifies the length of the
posterior words, ie the words which frequencies are computed.
Default: the maximum length of the words;
- lprior=l
- specifies the length of the prior
words, ie the words on which the computed words depend, in a
markovian context. Default: 0;
As special character "^"
stands for the limits of the
sequence, the words terminating with this symbol are counted as
" same length or longer than given length "-words;
Handling
-
__getitem__
- returns the string, in
format of
Compte of the posterior
corresponding to the given prior;
- limit_on
- returns a SHALLOW copy of the
Proportion limited to the priors that contain words of a
given list;
- __iadd__
- adds to self the proportions of
another Proportion;
- KL_MC
- computes Kullback-Leibler distance to a
Proportion, by Monte Carlo simulation on several
(default:100) Sequence of a
given length (default:1000) generated by method
read_prop of
Sequence. See
Sequence generation;
- lg_max
- returns the length of the longest word, prior
or posterior (prior+posterior);
- lg_max_prior
- returns the length of the longest
prior;
- lg_max_posterior
- returns the length of the longest
posterior;
- alph
- returns the list of the letters used in the
proportions;
- prefixes
- returns the list of the prefixes;
- next
- returns a list of
[posterior,proportion] for the specified prior. The wild letter ’.’
takes for any the letters;
- rand_next
- returns a posterior given a spedified
prior, randomly chosen among following the proportions of the prior;
- has_prefix
- returns True if
the specified word is a valid prior. Remember that character
^
stands for begin or end of sequence;
- is_empty
- returns True if the Proportion is
empty;
- has_post
- returns True if the Proportion has
a posterior with an empty prior.
Input-Output
Specific format is:
description |
|
lines of |
prior|posterior and count separated |
by a whitespace |
|
|
|
|
|
| example | |
|
A|B | 0.3 |
A|A | 0.7 |
B|B | 0.5 |
B|A | 0.5 |
|A | 0.1 |
|B | 0.9 |
^ |A | 0.5 |
^ |B | 0.5 |
|
In this example, following an A, proportion of B is 0.3, and
proportion of A is 0.7. Overall proportion of A is 0.1, and of B is
0.9. Proportion of beginning A is 0.5, as well as proportion of
beginning B.
-
__str__
- outputs in specific format;
- loglex
- returns the corresponding
Descripteur. See
read_prop in that class;
Sequence generation
From a Proportion, a (part of a)
Sequence can be generated randomly,
by the method read_prop.
The process is:
-
for all increasing positions i:
-
get the longest word w ending in i-1 that is a valid
prior (using method
has_prefix);
- if there is a posterior corresponding to w, let lp be the list
of corresponding couples [posterior,proportion] (using method
next); otherwise, lp is
the list of the uniform distribution of all letters of the
Proportion;
- randomly choose a posterior p according to the factors in list
lp (see under);
- if the first letter of p is a terminating character
(
^
), put character null (’\0’) at that position,
and exit;
otherwise put that letter at position i on the
Sequence.
Actually, the random choice of the posterior is made with
probabilities proportional to their respective proportions, even if
the sum of the proportions is different from 1. Then sequence
generation is possible even with non-orthodox proportions.