Previous Up Next

6  Predicting the number of classes



In this section, we build a random sequence, on a given number of classes, from a set of markovian models, and we use the probability algorithm to predict the given number.

In this example, the model is in file lprop1.
  1. We build a Partition, on a virtual sequence of length 10000, in 23 segments, such that each segment is at least 50 positions long.
    nbcl=23  # for example
    import lcompte
    lp=lcompte.Lproportion(fic="lprop1")
    import partition
    p=partition.Partition()
    p.build_random(10000,nbcl,ec=50)
    


  2. We attribute predictor numbers to the Segments of the Partition.
    lnum=lp.num()
    import random
    numr=random.choice(lnum)
    for s in p:
        s.g_num([numr])
        x=random.choice(lnum)
        while x==numr:
            x=random.choice(lnum)
        numr=x
    


  3. Then we build a Sequence from the Partition and the Lproportion.
    import sequence
    s=sequence.Sequence()
    s.read_Part(p,lp)
    


  4. We compute the segmentation probabilities on the sequence from the models used to build the sequence, up to 100 classes.
    import lexique
    lx=lexique.Lexique(Lprop=lp)
    lprob=lx.probability(s,100)
    
    from rpy import *   #using R to plot it
    r.plot(range(1,len(lprob)+1),lprob,ylab="Log-probability",xlab="Nbe of classes")
    
    


  5. We compute the logarithm of bayesian factor: the probability of the sequence under the segmentations in k-segments for all k>=n, divided by the probability of the sequence under the segmentations in k-segments for all k<n:
    import math
    def expmean(l): #computes the logarithm of the mean of exponentials
        m=max(l)
        x=0
        for i in l:
            x+=math.exp(i-m)
        return(math.log(x/len(l))+m)
    
    lbay=[]
    for i in range(1,len(lprob)):
        lbay.append(expmean(lprob[i:])-expsmean(lprob[:i]))
    
    r.plot(range(2,len(lprob)+1),lbay,ylab="Log of the Bayes factor",xlab="Nbe of classes")
    


  6. We select the biggest number of classes that is over 3.2.
    r.abline(h=3.2,col="red")
    
    nbc=lbay.index(filter(lambda x: x>=3.2,lbay)[-1])+2
    print nbc
    

Previous Up Next