class Matrice
module matrice
The documentation is here.
This class is used to store sequences of vectors indexed by letters or
characters numbers (ie numbers between 0 and 255 and prefixed by a
#
). For example, such vectors can be letters frequencies.
Via ascii code, there is equivalence between character numbers and
letters.
Positions in a Matrice are numbered from 0 to length-1.
Construction
-
__init__
- Optional keyword fic allows
construction by reading a file.
Matrice is a two-dimension tabular in C++. The
Matrice format is described in
Input-Output.
For construction, the memory for a Matrice must be allocated
by either:
-
generate
- generates an empty Matrice of given
length and given a list of letters and/or numbers (between 0 and 255);
- read_nf
- reads a filename in specific
format;
- copy
- copies another Matrice into this new one;
- compress
- compresses a
data either by summing or averaging the
occurences of the letters over non-overlapping windows of a
specified size. If uncomplete, last window is not stored;
- prediction
- computes at each position of a
data the predictions of
descriptors of a
Lexique.
The numbers of the descriptors of the Lexique are set
between 0 and 255, if needed.
- derivate
- computes from a data
the differences between successive positions;
- integrate
- computes from a data
cumulated sums from the first position;
- fb
- in HMM context, uses Forward-Backward
algorithm on a Sequence using a
Lexique.
The numbers of the descriptors of the Lexique are set
between 0 and 255, if needed.
For each descriptor, at each position,
the value set is the log-probability of the occurrence of this
descriptor, given the HMM and the data.
- backward
- in HMM context, uses
Backward algorithm on a Sequence using
a Lexique.
The numbers of the descriptors of the Lexique are set
between 0 and 255, if needed.
For each descriptor, at each position,
the value is the log-probability of the post-position part of the
data, given the HMM and the descriptor
at this position [Rab89];
- forward
- in HMM context, uses
Forward algorithm on a Sequence using
a Lexique.
The numbers of the descriptors of the Lexique are set
between 0 and 255, if needed.
For each descriptor, at each position,
the value is the log-probability of the ante-position part of the
data and of the descriptor at this
position, given the HMM [Rab89];
- set_proba
- normalizes each line so that the values
are the logarithms of probabilities. If the former values are
(xi)i, the new ones are: xi−log(Σi exp(xi));
- exp
- replaces the values by their exponential;
- shuffle
- randomly shuffles the Matrice by
(len*(log(len)+1)/2) random transpositions;
Handling
-
__len__
- returns the length of the
Matrice;
- n_desc
- returns the number of
descriptors;
- desc
- returns the list of the
descriptors;
- __getslice__
- returns a sub-matrice.
Beware: this operator does NOT create a new Matrice object,
but only a shallow copy, hence it must be used with care;
- val
- returns the value on a letter at a position. The
first argument of this function can be either a letter or a number. For
example, val('a',1) is the same as val(97,1) and
val('
#
97',1);
- g_val
- is used to change a value on a letter at a
position in the Matrice. The second argument of this
function can be either a letter or a number. For example,
g_val(0.5,'a',1) is the same as g_val(0.5,97,1) and
g_val(0.5,'
#
97',1);
- max
- returns the maximum value at a given descriptor.
For example:
>>> import matrice
>>> m=matrice.Matrice(fic="es.mat")
>>> print m
5
#20 B
0 1
2 1
3 4
1 4
5 1
>>> len(m)
5
>>> m.desc()
['#20', 'B']
>>> m.max(20)
5
>>> n=m[1:4]
>>> print n
3
#20 B
2 1
3 4
1 4
>>> n.val('#20',1)
3.0
>>> n.g_val(7,20,1)
>>> print n
3
#20 B
2 1
7 4
1 4
>>> print m
5
#20 B
0 1
2 1
1 4
1 4
5 1
-
__add__
- returns a NEW Matrice which is
the sum of corresponding values in both Matrice, if those
Matrice have the same length and descriptors;
- __iadd__
- adds to the values of the first
Matrice the corresponding values from the second one, if
both Matrice have the same length and descriptors;
- __sub__
- returns a NEW Matrice which is
the substraction of corresponding values in both Matrice,
if those Matrices have the same length and descriptors;
- __isub__
- substrates from the values of the first
Matrice the corresponding values from the second one, if
both Matrice have the same length and descriptors.
-
line
- returns a dictionnary which keys are the
descriptors of the Matrice and corresponding items are the
values at specified line;
Input-Output
Specific format is:
description |
|
length of the Matrice |
letters separated by spaces or tabulations |
arrays of values separated by spaces or tabulations |
|
|
|
|
|
example |
|
5 |
A C B |
3.09 4.5 3 |
2 0 0 |
1 0 0 |
1.19302 2 5 |
0 0.322 19.202 |
|
- __str__
- outputs in specific format, in which
columns are tabular separated.