Previous Up Next

2  Segmenting a sequence



We work on a well-known sequence, the DNA sequence of λ−phage, in file lambda.seq, which is in specific format. In this example, we use a previously defined HMM, from file lprop1.
  1. We load the sequence and the lexique corresponding to the HMM.
    import sequence
    s=sequence.Sequence(fic="lambda.seq")
    len(s)
    import lexique
    lx=lexique.Lexique(fprop="lprop1")
    print lx
    


  2. We build and draw the partition of the states of the most probable "path" computed with Viterbi algorithm.
    import partition
    p=partition.Partition()
    p.viterbi(s,lx)
    len(p)
    print p
    p.draw_nf("lambda1.ps",num=1)
    


  3. We compute the Matrice of the log-probabilities of the descriptors given the sequence and the model, by Forward-Backward algorithm.
    import matrice
    m=matrice.Matrice()
    m.fb(s,lx)
    print m[:10]
    


  4. We build a new partition, in which each segment is made where the most probable descriptor in former Matrice is the same on a continuous set of positions.
    p2=partition.Partition()
    p2.read_Matrice(m)
    len(p2)
    p2.draw_nf("lambda2.ps",num=1)
    


  5. And if we want to draw the partition such that the height of each arc is proportional with minus the density of log-probability of the model on this segment.
    p2.draw_nf("lambda_val.ps",num=1,func=lambda x: -x.val()/len(x))
    
  6. And we want to draw only the segments described by descriptor 3.
    p2.draw_nf("lambda_val3.ps",seg=[3],func=lambda x: -x.val()/len(x))
    
  7. We compute the proportion of common descriptions between both partitions.
    float(p2.pts_comm(p))/len(s)
    


  8. We build the 50-partitioning from the predictions of the lexique, and draw it.
    import parti_simp
    ps=parti_simp.Parti_simp()
    ps.mpp(s,lx,50)
    ps.draw_nf("lambda_ps.ps",num=1)
    


  9. We compute the list of the similarities between the partitions of ps and both partitions p1 and p2.
    l=[]
    for i in range(len(ps)):
       l.append([i+1,float(ps[i].pts_comm(p))/len(s),float(ps[i].pts_comm(p2))/len(s)])
    
    for i in l:
       print i    
    
    Notice and compare the best scores and their corresponding number of segments with the actual number of segments of p and p2.

Previous Up Next