Previous Up Next

class Matrice



module matrice

The documentation is here.

This class is used to store sequences of vectors indexed by letters or characters numbers (ie numbers between 0 and 255 and prefixed by a #). For example, such vectors can be letters frequencies.

Via ascii code, there is equivalence between character numbers and letters.

Positions in a Matrice are numbered from 0 to length-1.

Construction

__init__
Optional keyword fic allows construction by reading a file.
Matrice is a two-dimension tabular in C++. The Matrice format is described in Input-Output.

For construction, the memory for a Matrice must be allocated by either:
generate
generates an empty Matrice of given length and given a list of letters and/or numbers (between 0 and 255);
read_nf
reads a filename in specific format;
copy
copies another Matrice into this new one;
compress
compresses a data either by summing or averaging the occurences of the letters over non-overlapping windows of a specified size. If uncomplete, last window is not stored;
prediction
computes at each position of a data the predictions of descriptors of a Lexique.
The numbers of the descriptors of the Lexique are set between 0 and 255, if needed.

derivate
computes from a data the differences between successive positions;
integrate
computes from a data cumulated sums from the first position;
fb
in HMM context, uses Forward-Backward algorithm on a Sequence using a Lexique.
The numbers of the descriptors of the Lexique are set between 0 and 255, if needed.
For each descriptor, at each position, the value set is the log-probability of the occurrence of this descriptor, given the HMM and the data.
backward
in HMM context, uses Backward algorithm on a Sequence using a Lexique.
The numbers of the descriptors of the Lexique are set between 0 and 255, if needed.
For each descriptor, at each position, the value is the log-probability of the post-position part of the data, given the HMM and the descriptor at this position [Rab89];
forward
in HMM context, uses Forward algorithm on a Sequence using a Lexique.
The numbers of the descriptors of the Lexique are set between 0 and 255, if needed.
For each descriptor, at each position, the value is the log-probability of the ante-position part of the data and of the descriptor at this position, given the HMM [Rab89];
set_proba
normalizes each line so that the values are the logarithms of probabilities. If the former values are (xi)i, the new ones are: xi−log(Σi exp(xi));
exp
replaces the values by their exponential;
shuffle
randomly shuffles the Matrice by (len*(log(len)+1)/2) random transpositions;

Handling

__len__
returns the length of the Matrice;
n_desc
returns the number of descriptors;
desc
returns the list of the descriptors;
__getslice__
returns a sub-matrice.
Beware: this operator does NOT create a new Matrice object, but only a shallow copy, hence it must be used with care;
val
returns the value on a letter at a position. The first argument of this function can be either a letter or a number. For example, val('a',1) is the same as val(97,1) and val('#97',1);
g_val
is used to change a value on a letter at a position in the Matrice. The second argument of this function can be either a letter or a number. For example, g_val(0.5,'a',1) is the same as g_val(0.5,97,1) and g_val(0.5,'#97',1);
max
returns the maximum value at a given descriptor.
For example:
>>> import matrice
>>> m=matrice.Matrice(fic="es.mat")
>>> print m
5
#20     B               
0       1       
2       1       
3       4       
1       4       
5       1       

>>> len(m)
5
>>> m.desc()
['#20', 'B']
>>> m.max(20)
5
>>> n=m[1:4]
>>> print n
3
#20     B               
2       1       
3       4       
1       4       

>>> n.val('#20',1)
3.0
>>> n.g_val(7,20,1)
>>> print n
3
#20     B               
2       1       
7       4       
1       4       

>>> print m
5
#20     B               
0       1       
2       1       
1       4       
1       4       
5       1       
__add__
returns a NEW Matrice which is the sum of corresponding values in both Matrice, if those Matrice have the same length and descriptors;
__iadd__
adds to the values of the first Matrice the corresponding values from the second one, if both Matrice have the same length and descriptors;
__sub__
returns a NEW Matrice which is the substraction of corresponding values in both Matrice, if those Matrices have the same length and descriptors;
__isub__
substrates from the values of the first Matrice the corresponding values from the second one, if both Matrice have the same length and descriptors.
line
returns a dictionnary which keys are the descriptors of the Matrice and corresponding items are the values at specified line;

Input-Output

Specific format is:
description
length of the Matrice
letters separated by spaces or tabulations
arrays of values separated by spaces or tabulations
 
 
 
 
example
5
A    C    B
3.09 4.5 3
2 0 0
1 0 0
1.19302 2 5
0 0.322 19.202


__str__
outputs in specific format, in which columns are tabular separated.

Previous Up Next