Previous Up Next

class Proportion

module compte

The documentation is here.

This class is used for proportions of words that follow given words. For example, it can store the fact that the proportions of letters following word AC are:
A   0.34
C   0.15
G   0.23
T   0.28

Then a distinction is made between prior words (such as AC here), and posterior words (such as A,C,G and T here).

The special character ^ is used to represent beginnings and ends of sequences.


Optional keyword fic allows construction by reading from a filename in specific format;
builds from a filename in specific format;
Builds from a Compte.

Optional keywords:
specifies the length of the posterior words, ie the words which frequencies are computed. Default: the maximum length of the words;
specifies the length of the prior words, ie the words on which the computed words depend, in a markovian context. Default: 0;

As special character "^" stands for the limits of the sequence, the words terminating with this symbol are counted as " same length or longer than given length "-words;


returns the string, in format of Compte of the posterior corresponding to the given prior;
adds to self the proportions of another Proportion;
computes KL-distance to a Proportion, by Monte Carlo simulation on several (default:100) Sequence of a given length (default:1000) generated by method read_prop of Sequence. See Sequence generation;
returns the length of the longest word, prior or posterior;
returns the list of the letters used in the counts;
returns a list of [posterior,proportion] for the specified prior;
returns True if the specified word is a valid prior. Remember that character ^ stands for begin or end of sequence.


Specific format is:
lines of
      prior|posterior and count separated
         by a whitespace
A|B 0.3
A|A 0.7
B|B 0.5
B|A 0.5
|A 0.1
|B 0.9
^|A 0.5
^|B 0.5

In this example, following an A, proportion of B is 0.3, and proportion of A is 0.7. Overall proportion of A is 0.1, and of B is 0.9. Proportion of beginning A is 0.5, as well as proportion of beginning B.
outputs in specific format;
returns the corresponding Lexique. See read_prop in that class;

Sequence generation

From a Proportion, a (part of a) Sequence can be generated randomly, by the method read_prop.

The process is: Actually, the random choice of the posterior is made with probabilities proportional to their respective proportions, even if the sum of the proportions is different from 1. Then sequence generation is possible even with non-orthodox proportions.

Previous Up Next