A simple descriptor can be seen as a function applied to a position in a data and its vicinity, and returning a floating-point value.
In a data, on a position, a letter has a value:
A floating-value written between parentheses after a descriptor
multiplies the prediction of this descriptor by that value. For
example, on a Sequence,
descriptor A
returns 1 on A, and 0 elsewhere, whereas
descriptor A(0.7)
returns 0.7 on A, and 0 elsewhere.
For operators, notation is a prefix one.
The accepted descriptors are:
letter ::= "a"..."z"|"A"..."Z"
! | returns 1 in any position (even if out of bounds); |
^ | returns 1 if the position is out of bounds, 0 otherwise. |
special ::= "^" | "!"
Beware: As the codes of special characters ! and ^
are
33 and 94, these codes must be used very cautiously.
character ::= #0..255
here-plus ::= +(descriptors)
here-mult ::= *(descriptors)
|(`AB'A(0.1)`AC'A(0.2)`AA'A(0.3))
returns 0.1.|(`AB'A(0.1)`AB'A(0.2))
returns 0.1.here-or ::= |(descriptorsdescriptors)
`A(0.5)CB(0.3)'
returns 0.5.forward ::= `descriptors'
|`BC(0.1)CC(0.2)AC(0.3)'
returns 0.1.|`BC(-0.1)BC(0.2)'
returns −0.1.When the computing descriptor is an "or"-operator
(here-or,
backward-or, or
forward-or), the current position for
the tests inside this computing descriptor is the preceding
preceding. Yet, their joined computing descriptors are used on the
actual current position.
For example, on position 1 of
Sequence CAB,
prediction of
|`B|`BC(0.1)CC(0.2)AC(0.3)'C|`BC(0.4)CC(0.5)AC(0.6)'A|`BC(0.7)CC(0.8)AC(0.9)''
returns 0.7.
forward-or ::= |`descriptorsdescriptors'
{A(0.5)CB(0.3)}
returns 0.3.backward ::= {descriptors}
backward-plus ::= +{descriptors}
|{BC(0.1)CC(0.2)AC(0.3)}
returns 0.1.|{BC(-0.1)BC(0.2)}
returns −0.1.When the computing descriptor is an "or"-operator
(here-or,
backward-or, or
forward-or), the current position for
the tests inside this computing descriptor is the preceding
preceding. Yet, their joined computing descriptors are used on the
actual current position.
For example, on position 3 of
Sequence ABC,
prediction of
|{B|{BC(0.1)CC(0.2)AC(0.3)}C|{BC(0.4)CC(0.5)AC(0.6)}A|{BC(0.7)CC(0.8)AC(0.9)}}
returns 0.3.
backward-or ::= |{descriptorsdescriptors}
Nb: these descriptors have been built for specific needs (such as traduction of markovian transition probabilities) but, owing to the C++ implementation, it is very easy to conceive new ones if necessary.
A pattern of descriptors is used in the context of maximum predictive partitionning. It is a word of successive simple descriptors, used periodically to compute predictions on data. The period starts with the first descriptor on the first position.
For example, as the prediction on a data is the sum of the predictions
on all the positions of the data, the prediction on sequence ACBCAB of descriptor pattern
AC is 4,
and prediction of descriptor pattern
CA is 0.
On a position, the prediction value is the value of the used descriptor.
On a data, the prediction of a simple descriptor is the sum of the predictions on all of the positions of the data.
In the case of a descriptor pattern, the descriptors are used
periodically, starting with the first descriptor of the pattern at the
first position.
For example, on sequence ACBCAB the prediction of descriptor
pattern
AC is 4,
and prediction of descriptor pattern
CA is 0.
Inside a Lexique, when there are transition-costs between descriptors, these costs are used in HMM context, ie in methods fb, backward, forward, and viterbi. In that case, these costs are added to the prediction at each transition between the descriptors.