Previous Up Next

2  Segmenting a sequence

We work on a well-known sequence, the DNA sequence of λ−phage, in file lambda.seq, which is in specific format. In this example, we use a previously defined HMM, from file lprop1.

  1. We load the sequence and the lexique corresponding to the HMM.
    import sequence s=sequence.Sequence(fic="lambda.seq") len(s) import lexique lx=lexique.Lexique(fprop="lprop1") print lx
  2. We build and draw the partition of the states of the most probable "path" computed with Viterbi algorithm.
    import partition p=partition.Partition() p.viterbi(s,lx) len(p) print p p.draw_nf("lambda1.ps",num=1)
  3. We compute the Matrice of the log-probabilities of the descriptors given the sequence and the model, by Forward-Backward algorithm.
    import matrice m=matrice.Matrice() m.fb(s,lx) print m[:10]
  4. We build a new partition, in which each segment is made where the most probable descriptor in former Matrice is the same on a continuous set of positions.
    p2=partition.Partition() p2.read_Matrice(m) len(p2)
  5. And if we want to draw the partition such that the height of each arc is proportional with minus the density of log-probability of the model on this segment.
    p2.draw_nf("lambda_val.ps",num=1,func=lambda x: -x.val()/len(x))
  6. And we want to draw only the segments described by descriptor 3.
    p2.draw_nf("lambda_val3.ps",seg=[3],func=lambda x: -x.val()/len(x))
  7. We compute the proportion of common descriptions between both partitions.
    float(p2.pts_comm(p))/len(s)
  8. We build the 50-partitioning from the predictions of the lexique, and draw it.
    import parti_simp ps=parti_simp.Parti_simp() ps.mpp(s,lx,50) ps.draw_nf("lambda_ps.ps",num=1)
  9. We compute the list of the similarities between the partitions of ps and both partitions p1 and p2.
    l=[] for i in range(len(ps)): l.append([i+1,float(ps[i].pts_comm(p))/len(s),\ float(ps[i].pts_comm(p2))/len(s)]) for i in l: print i

    Notice and compare the best scores and their corresponding number of segments with the actual number of segments of p and p2.


Previous Up Next