This class is used to store sequence data, such as genomic or proteic
sequences. It represents a succession of letters, such as
acagcaggcatagacaggatacagatttta.
Positions in a Sequence are numbered from 0 to length-1.
A Sequence can have a name, which is written on the first
line in fasta format, after the ">".
Construction
__init__
Optional keyword fic allows
construction by reading a file.
Sequence is implemented as a tabular in C++. For
construction, the memory for a Sequence must be allocated by:
generate
generates an empty Sequence with a
given length;
read_nf
reads a filename in
specific or FASTA format; to recognize
the formats, a filename in specific format must end with
.seq, and a filename in FASTA format must end with .fa
or .fst;
creates a new lg-length Sequence. In
that case, deb and fin are not read;
etat_init=e
makes generation beginning with
descriptor number e if it is valid. Otherwise, starts with a
random descriptor of the
Lproportion;
read_Part
builds randomly from a
Partition and a
Lproportion; each
Segment of the
Partition must have descriptors
numbers, and each number must be the number of a
Proportion of the
Lproportion. At each
position, the Proportion
corresponding to the number is used to randomly generate a letter,
as for a Sequencegeneration;
copy
copies deeply from another Sequence.
Handling
__len__
returns the length of the
Sequence;
__getitem__ and __getslice__
are
implemented, to get respectively characters and sub-sequences.
Beware: operator __getslice__ DOES NOT create a new
Sequence object, but only a shallow copy, hence it must be
used with care;