What is a Motif ?
|
|
Let D={A,C,G,T} be the alphabet of the nucleotide sequences. A
motif (pattern, signal...)
is an object dennoting a set of sequences on this alphabet, either in a
deterministic or probabilistic way.
Given a sequence S and a motif m, we will say that the motif m occurs in
S if any of the sequences denoted by m occurs in S.
A Hierarchy of Motif Descriptors
|
|
Sequence motifs can be described in a wide variety of ways.
- Exact Word. The description is an specific sequence in the alphabet.
CTTAAAATAA
- Consensus Sequences. The description allows for the
specification of alternative nucleotides occurring at a given position.
YTWWAAATAR (Consensus MEF2 sequence, Yu et al., 1992)
CTAAAAATAA
TTAAAAATAA
TTTAAAATAA
CTATAAATAA
TTATAAATAA
CTTAAAATAG
TTTAAAATAG
..........
- Regular Expressions. The description is built on an
extension of the original alphabet. Among the new symbols of this extended
alphabet, there symbols dennoting the alternative occurence of a number of
nucleotides at a given position, and symbols denoting that a given
position may not be present.
C..?[STA]..C[STA][^P]C
(ferredoxin, iron-sulfur binding region signature, PROSITE database, Bairoch, 1991)
- Position Weigth Matrices. The description includes a
weight (score, probability, likelihood) for each symbol occuring at each
position along the motif.
Follow the link for An Introduction to Position Weigth Matrices
PRACTICAL
|
|
|