Segmenting a sequence

2 Segmenting a sequence

We work on a well-known sequence, the DNA sequence of λ−phage, in file lambda.seq, which is in specific format. In this example, we use a previously defined HMM, from file lprop1.

We load the sequence and the lexique corresponding to the HMM.

import sequence
s=sequence.Sequence(fic="lambda.seq")
len(s)
import lexique
lx=lexique.Lexique(fprop="lprop1")
print lx

We build and draw the partition of the states of the most probable "path" computed with Viterbi algorithm.
```
import partition
p=partition.Partition()
p.viterbi(s,lx)
len(p)
print p
p.draw_nf("lambda1.ps",num=1)
```
We compute the Matrice of the log-probabilities of the descriptors given the sequence and the model, by Forward-Backward algorithm.
```
import matrice
m=matrice.Matrice()
m.fb(s,lx)
print m[:10]
```
We build a new partition, in which each segment is made where the most probable descriptor in former Matrice is the same on a continuous set of positions.
```
p2=partition.Partition()
p2.read_Matrice(m)
len(p2)
p2.draw_nf("lambda2.ps",num=1)
```
And if we want to draw the partition such that the height of each arc is proportional with minus the density of log-probability of the model on this segment.
```
p2.draw_nf("lambda_val.ps",num=1,func=lambda x: -x.val()/len(x))
```

And we want to draw only the segments described by descriptor 3.

p2.draw_nf("lambda_val3.ps",seg=[3],func=lambda x: -x.val()/len(x))

We compute the proportion of common descriptions between both partitions.
```
float(p2.pts_comm(p))/len(s)
```

We build the 50-partitioning from the predictions of the lexique, and draw it.

import parti_simp
ps=parti_simp.Parti_simp()
ps.mpp(s,lx,50)
ps.draw_nf("lambda_ps.ps",num=1)

We compute the list of the similarities between the partitions of ps and both partitions p1 and p2.
```
l=[]
for i in range(len(ps)):
   l.append([i+1,float(ps[i].pts_comm(p))/len(s),float(ps[i].pts_comm(p2))/len(s)])

for i in l:
   print i    
```
Notice and compare the best scores and their corresponding number of segments with the actual number of segments of p and p2.