EMBO Practical Course on Sequence Analysis and Molecular Evolution
Profile Searching Practical
by Toby Gibson, Ewan Birney and Des Higgins, 9/7/97
In this practical we will use profile search tools available through the web. Profile searches are one of the most sensitive search tools currently available. The raw material for profile searching is a multiple sequence alignment. A profile scores the amino acids at each position in the alignment: conserved positions score more strongly than unconserved ones (whereas in a single sequence, they are all equally significant). We can look at the effect of setting up the profile with different residue substitution matrices. We can compare the sensitivity to a search with a single sequence as query.
WWW DB Tools
We will use:
Bioccelerators are installed at both EBI and EMBL, which should be useful in case of server or network problems. The servers are not identical and if you try more than one you may notice some differences.
Step 0 Build a TFIIB multiple alignment
Step 1 Preparing a profile from a TFIIB alignment
TFIIB is a core transcription factor in both eukaryotes and archaea which has been quite strongly conserved through evolution. TFIIB has a ~90 residue duplicated domain, the TFIIB repeats, with N- and C-terminal extensions. A second protein family in eukaryotes (but not found in archaea) shares the same structural topology, and presumably shares common ancestry, although the function is not conserved. Well-optimised searches with TFIIB queries should be able to find this second family, which has many divergent entries, and the number of entries that are picked up is a measure of the search sensitivity.
Step 2 BIC_Profilesearch with a TFIIB profile prepared with the Blosum62 matrix
The Bioccellerator is fast dedicated hardware exclusively designed for dynamic programming (ie. slow but sensitive) sequence comparison. It is built by the company Compugen. It can perform a number of search permutations including basic Smith-Waterman, profile searches and Protein v. DNA frame-shifting comparisons. Today we will do the Profile Search, which finds the best matching segments between a query profile (derived from a multiple alignment) and a database sequence, allowing for gaps to be inserted at any position.
The search will take a couple of minutes (unless the Bic is busy). When it is finished you can look at the high-score list and alignments in the output.
Questions
Step 3 BIC_Profilesearch with the TFIIB profile prepared with the Gonnet Pam250 matrix
Now repeat the search but use a profile made with a softer matrix: ie a matrix that weights similar residue exchanges more highly.
First make a new profile:
Now run another search:
Now you can compare the results of the Gonnet Pam 250 and Blosum62 matrices.
Questions
Step 4 Bic_SW search with the human TFIIB sequence
Now set up a search with TFIIB_Human, in order to determine whether profile searches are really more sensitive than a single sequence as query.
Now you can compare the results of the single sequence with the profile query.
Questions
Take Home Lessons
Optimisation of the search setup is vital: again in practice this means running test searches. Choosing a good residue substitution matrix is important. Optimisation of gap penalties is also critical (we did not look at this today). The TF2B profile is actually only slightly more sensitive than the most optimised query with a TFIIB sequence. (We were a bit wicked and chose a poor starting query: TF2B_Human does much better, as yeast has diverged more from the common ancestral sequence). However, by adding in the BRF sequences (entries TF3B*) and then the best alignable cyclins, we would bring in more and more divergent cyclins. The RB sequences are also genuine hits, but have only a single domain. Reciprocal searches with profiles based on the cyclin box, and the conserved motif in the RB family would need to be undertaken: in each case they would support the idea that these families are related. How this was done in practice, and some tips on setting up and evaluating profile searches, are given in the references below.
References