Previous Up Next

class Proportion



module compte

The documentation is here.

This class is used for proportions of words that follow given words. For example, it can store the fact that the proportions of letters following word AC are:
A   0.34
C   0.15
G   0.23
T   0.28

Then a distinction is made between prior words (such as AC here), and posterior words (such as A,C,G and T here).

The special character ^ is used to represent beginnings and ends of sequences.

Construction

__init__
Optional keyword fic allows construction by reading from a filename in specific format;
read_nf
builds from a filename in specific format;
read_Compte
Builds from a Compte.

Optional keywords:
lpost=l
specifies the length of the posterior words, ie the words which frequencies are computed. Default: the maximum length of the words;
lprior=l
specifies the length of the prior words, ie the words on which the computed words depend, in a markovian context. Default: 0;

As special character "^" stands for the limits of the sequence, the words terminating with this symbol are counted as " same length or longer than given length "-words;

Handling

__getitem__
returns the string, in format of Compte of the posterior corresponding to the given prior;
__iadd__
adds to self the proportions of another Proportion;
KL_MC
computes KL-distance to a Proportion, by Monte Carlo simulation on several (default:100) Sequence of a given length (default:1000) generated by method read_prop of Sequence. See Sequence generation;
lg_max
returns the length of the longest word, prior or posterior;
alph
returns the list of the letters used in the counts;
next
returns a list of [posterior,proportion] for the specified prior;
has_prefix
returns True if the specified word is a valid prior. Remember that character ^ stands for begin or end of sequence.

Input-Output

Specific format is:
description
lines of
      prior|posterior and count separated
         by a whitespace
 
 
 
 
 
example  
A|B 0.3
A|A 0.7
B|B 0.5
B|A 0.5
|A 0.1
|B 0.9
^|A 0.5
^|B 0.5

In this example, following an A, proportion of B is 0.3, and proportion of A is 0.7. Overall proportion of A is 0.1, and of B is 0.9. Proportion of beginning A is 0.5, as well as proportion of beginning B.
__str__
outputs in specific format;
loglex
returns the corresponding Lexique. See read_prop in that class;

Sequence generation

From a Proportion, a (part of a) Sequence can be generated randomly, by the method read_prop.

The process is: Actually, the random choice of the posterior is made with probabilities proportional to their respective proportions, even if the sum of the proportions is different from 1. Then sequence generation is possible even with non-orthodox proportions.


Previous Up Next