class Sequence
module sequence
The documentation is here.
This class is used to store sequence data, such as genomic or proteic
sequences. It represents a succession of letters, such as
acagcaggcatagacaggatacagatttta.
Positions in a Sequence are numbered from 0 to length-1.
A Sequence can have a name, which is written on the first
line in fasta format, after the ">".
Construction
-
__init__
- Optional keyword fic allows
construction by reading a file.
Sequence is implemented as a tabular in C++. For
construction, the memory for a Sequence must be allocated by:
-
generate
- generates an empty Sequence with a
given length;
- read_nf
- reads a filename in
specific or FASTA format; to recognize
the formats, a filename in specific format must end with
.seq
, and a filename in FASTA format must end with .fa
or .fst
;
- read_prop
- builds or changes
randomly from a Proportion; see
Sequence generation.
Optional keywords:
-
deb=d
- changes only after position d (>=0) included;
- fin=f
- changes only before position f (<len()) included;
- long=lg
- creates a new lg-length Sequence. In that
case, deb and fin are not read;
- read_Lprop
- builds randomly from
Lproportion and returns the
resulting Partition; see
Sequence generation.
Optional keywords:
-
deb=d
- changes only after position d (>=0) included;
- fin=f
- changes only before position f (<len()) included;
- long=lg
- creates a new lg-length Sequence. In
that case, deb and fin are not read;
- etat_init=e
- makes generation beginning with
descriptor number e if it is valid. Otherwise, starts with a
random descriptor of the
Lproportion;
- read_Part
- builds randomly from a
Partition and a
Lproportion; each
Segment of the
Partition must have descriptors
numbers, and each number must be the number of a
Proportion of the
Lproportion. At each
position, the Proportion
corresponding to the number is used to randomly generate a letter,
as for a Sequence
generation;
- copy
- copies deeply from another Sequence.
Handling
-
__len__
- returns the length of the
Sequence;
- __getitem__ and __getslice__
- are
implemented, to get respectively characters and sub-sequences.
Beware: operator __getslice__ DOES NOT create a new
Sequence object, but only a shallow copy, hence it must be
used with care;
- __setitem__
- is used to change a letter in the
Sequence.
- __setslice__
- is used to change a segment of the
Sequence by the letters of a string or a Sequence.
BEWARE: if the included part is of the same length as the replaced
segment, the Sequence is modified in place, otherwise a new
Sequence is built. Hence different behaviours can occur if
the replacement is made inside a subsequence. See the example below.
For example:
> import sequence
> s=sequence.Sequence(fic="toto.fa")
> len(s)
10
> b=s[3:5]
> print b
3
ACG
> b[2]='A'
> print s[3:5]
3
ACA
> b[:2]="TT"
> print s[3:5] # b and s are still linked
3
TTA
> b[:2]="ACG"
> print b
4
ACGA
> print s[3:5] # b and s are no longer linked
3
TTA
#################################
-
alpha
- returns the list of different letters in the
Sequence;
- shuffle
- randomly shuffles the Sequence by
(len*(log(len)+1)/2) random transpositions;
- g_name
- sets the name;
- name
- returns the name;
Input-Output
Specific format is:
description |
|
length of the sequence |
sequence with any spaces and |
returns as wanted |
| example |
|
20 |
ACGGGAAGCTAA |
AGCTGCG T |
|
-
__str__
- outputs in specific format;
- fasta
- outputs in fasta format. The name of the
Sequence is written on first line after ">";
Optional keyword:
-
lg=d
- sets the length of the lines (default: 80). If
null, returns the sequence in one line.
- seq
- outputs the mere sequence of letters as a string.