Consensus sequence:
Given a collection of known binding sites, a consensus sequence is developed
by giving the most preferred base at each position within a site. Then, this
pattern can be used to search in other sequences for new sites.
Disadvantage:
Exact matching means a loss of information so that a fixed number of
mismatches is usually allowed to express some degree of ambiguity.
sequence 1 |
TACGAT |
sequence 2 |
TATAAT |
sequence 3 |
TATAAT |
sequence 4 |
GATACT |
sequence 5 |
TATGAT |
sequence 6 |
TATGTT |
consensus sequence |
TATAAT |
consensus (IUPAC code) |
TATRNT |