Low Complexity Filtering
The server filters your query sequence for low compositional
complexity regions by default. Low complexity regions commonly give
spuriously high scores that reflect compositional bias rather than
significant position-by- position alignment.
Filtering can elminate these potentially confounding matches (e.g.,
hits against proline-rich regions or poly-A tails) from the blast reports,
leaving regions whose blast statistics reflect the specificity of their
pairwise alignment. Queries searched with the blastn program are filtered
with DUST. Other programs use SEG.
Low complexity sequence found by a filter program is substituted using the
letter `N' in nucleotide sequence (e.g., `NNNNNNNNNNN') and the
letter `X' in protein sequences (e.g., `XXXXXXX').
References
- Hancock, J.M. and Armstrong, J.S. (1994) SIMPLE34: an improved and enhanced
implementation for VAX and Sun computers of the SIMPLE algorithm for analysis
of clustered repetitive motifs in nucleotide sequences. Comput. Applic.
Biosci., 10, 67-70.
- Wootton, J.C. and Federhen, S. (1993) Statistics of
local complexity in amino acid sequences and sequence databases.
Comput. Chem., 17, 149-163.
- Wootton, J.C. and Federhen, S. (1996) Analysis of compositionally biased
regions in sequence databases. Methods Enzymol., 266, 554-571.
- Altschul, S.F., Boguski, M.S., Gish, W. and Wootton J.C. (1994) Issues in
searching molecular sequence databases. Nature Genet., 6, 119-129.
If you have problems or comments...
Back to PBIL home page