In this practice, a weight matrix computed in
P. Bucher. Journal of molecular biology 212: 563-578 (1990),
from sequences containing real TATA-boxes will be obtained by accessing to
the TRANSFAC database and reading the corresponding entry. Then, this matrix
will be used to scan for putative TATA boxes in the input set of promoter regions.
|
ADVICE: It is very useful to open 2 or more browser windows, preserving this
text in one of them and running the exercise using another one.
Input sequences:
6 genes of Drosophila melanogaster.
WWW tools:
Regulatory Sequence Analysis Tools (RSA) by Jacques van Helden
from SCMBB - Service de Conformation des Macromolécules Biologiques et de Bioinformatique (Université Libre de Bruxelles).
TRANSFAC database, the transcription factor database.
Step 1: Access TRANSFAC to acquire the TATA box entry
- Connect to TRANSFAC database.
- Click over the Browse link.
- Click over the Matrix by Factor name link.
- Find TATA in the list of transcription factors and then click over.
- Click over one of these Tata Binding Proteins (TBPs): T00794, T00796
or T0097 to access to their information card.
- Lines starting with MX means matrices entries related to this
factor. There are two matrices: select V$TATA_01 and take a look at the entry
containing the weight matrix (do not lose this window!).
Step 2:Use the matrix to find TATA-boxes in a set of promoter regions.
- Read the promoter regions in fasta format.
- Connect to RSA tools server.
- In the menu (left frame), click over Pattern matching: patser (matrices).
- Copy and paste the matrix into the Matrix box.
- Select transfac in the Format (matrix) selector.
- Copy and paste the promoter regions into the Sequence box.
- Choose "single" in the Search strands selector.
- Press the button Go to submit the query.
- Press the button Feature map to enter
a new menu about plotting the results.
- Click the button Go to obtain
a graphical output of the reported matches.
Questions:
- Which conclusion can you extract from the plot? Some regions are predicted to contain more than
5 TATA boxes while zero or one are supposed to happen in real genes. Perhaps, increasing the threshold
a smaller amount of results will be predicted.
Results:
TRANSFAC possess its own program to scan promoter regions using the available
collection of matrices (MatInspector).