Gibbs sampling:
- Looking for the most conserved (and unknown) pattern of a given fixed length
in a set of sequences.
- The algorithm works iteratively, repeating some
operations until some stopping condition is met (usually, pattern quality).
- The results are represented as weight matrices or logos.
- The score measures the significance of alignments sampled.
Disadvantage:
Due to the take of random decisions, this algorithm does not guarantee that
the actual best conserved pattern will be found having the highest score,
and converging to a local maximum, rather than to the global one. Therefore,
more than one running over the same input is recommended.
Example (AlignAce output):
Motif 1
GCAAAGCGTGA 0 13 0
GCACTGCCTGA 0 218 1
GCCCAGCGTGA 0 486 1
GCATTGCGGGC 1 251 0
GCAGAGCCCGA 1 440 1
GAAAAGCGCGC 2 312 1
GCACTGCCCGA 2 446 0
CCATTGCCCGA 3 196 0
GCAAAGCCTGA 4 100 1
GAGTAGCCTGA 4 155 0
GCAGAGCCTGC 4 348 1
*** *******
MAP Score: 15.1348
|
Example (makelogo output):
Algorithm.
Given n input sequences.
Starting step:
Select randomly one subsequence of fixed length from each input sequence to
build the initial set of occurences (pattern or motif).
Iteration step:
- Choose one input sequence
- Extract the occurence of this sequence and build a weight matrix with the
rest of occurences.
- Use this matrix to score every segment of the fixed length in the selected
sequence (sampling).
- Choose "randomly" a new occurrence in the selected sequence according
to the scores obtained by every segment candidate to be the new one.
- Add the new occurences to the output set.
- Repeat until no improvement (score) is reached.