In this practice we will use consensus sequences of several well-known
binding sites to find potential binding sites in a set of putative promoter
regions corresponding to coexpressed genes in a DNA-microarray experiment.
Consensus sequences are extracted from:
P. Bucher. Journal of molecular biology 212: 563-578 (1990)
|
ADVICE: It is very useful to open 2 or more browser windows, preserving this
text in one of them and running the exercise using another one.
Input sequences:
6 genes of Drosophila melanogaster.
WWW tools:
Regulatory Sequence Analysis Tools (RSA) by Jacques van Helden
from SCMBB - Service de Conformation des Macromolécules Biologiques et de Bioinformatique (Université Libre de Bruxelles).
Step 1: Exact matches of TATA-box consensus: STATAAAWR
Step 2: Partial matches of TATA-box consensus: STATAAAWR
- Repeat the process but increasing the number of allowed mismatches in
the pattern (try 1, 2 and 3 in Substitutions box).
- Results will be displayed below the headline
"PatID Strand Pattern SeqID Start End Matching_word Score"
- Click the button Feature map to enter
a new menu about plotting the results.
- Click the button Go to obtain
a graphical output of the reported matches. Browse across the interactive map.
Questions:
- Real TATA-boxes are supposed to appear 20 bp before Transcription Start Site
(TSS). How many of the TATA boxes are in this range? NOTE: TSS annotations might
easily contain errors and therefore ranges and distances will be useless.
- How many occurences will you get if you try 9 substitutions (everything)?
Do you get one occurence in every position of the sequence? Why not? Think about the
option prevent overlapping matches . Try switching it off.
Results: