Building and using HMM

3 Building and using HMM

This part is based on data file seq_hmm.fa and Lproportion file lprop2.

We want to compute partitions on a set of sequences from a HMM, and compute proportions from the resulting partitions.

We load the sequences and the HMM.
import lsequence ls=lsequence.Lsequence() ls.read_nf('seq_hmm.fa') import lcompte lpr=lcompte.Lproportion(fic="lprop2") print lpr
We notice that in these proportions there is no transition proportion for the sequences beginning.
We compute a Lpartition with Viterbi algorithm, and print the number of segments of each partition
import lpartition lpa=lpartition.Lpartition() lpa.add_Lseq(ls) import lexique lx=lexique.Lexique() lx.read_Lprop(lpr) lpa.viterbi(lx) for i in lpa: print len(i[1]),
We build a new Lcompte of 2-length words from the segmentations.
lc=lcompte.Lcompte() lc.read_Lpart(lpa,2) print lc
We compute a new HMM from the 1|1-proportions on the latter Lcompte.
lpr2=lcompte.Lproportion() lpr2.read_Lcompte(lc.rstrip(),lprior=1,lpost=1)
We compute the Kullback-Leibler divergence form lpr to lpr2, using MC simulation on 100 sequences of length 5000.
lpr.KL_MC(lpr2,100,5000)
On the studied sequences, we compute new partitions with this new HMM with FB algorithm.
lx2=lexique.Lexique() lx2.read_Lprop(lpr2) lpa2=lpartition.Lpartition() lpa2.add_Lseq(lpa.Lseq()) lpa2.fb(lx2)
And so on.