Comparative Gene Prediction |
Researchers have learned a great deal about the function of human genes by examining their counterparts in simpler model organisms such as the mouse. Genome researchers look at many different features when comparing genomes: sequence similarity, gene location, the length and number of coding regions (called exons) within genes, the amount of non-coding DNA in each genome, and highly conserved regions maintained in organisms as simple as bacteria and as complex as humans. On the other hand, finding similarities is not as much important as finding differences. The comparative approach also points out those features which are unique for a given phylogenetic group or particularly a species. Species specific functions can be involved in, for instance, pathogenicity, resistance to antibiotics, and so on, but also will result on more complex phenotypic characters such as the human ability to speak. Ab initio gene finding programs integrated different measures obtained from the raw genomic sequences, such as G+C content, periodicity of coding regions, exon bounds signal detection, etc. The obvious next step was to include homology from the growing annotation databases like SWISSPROT and EMBL/GenBank. Modern gene prediction programs can integrate the data obtained from the comparison of two genomes to improve the exonic structure of already predicted genes. Furthermore, novel genes not represented in the annotation databases can be found as well. |
Overview |
In this section we will run several ab initio gene prediction programs on a particular genomic DNA sequence and we will compare the results against predicted genes from a gene finding program that uses genomic homology. For each of these programs we will obtain a prediction of a candidate gene and we will analyze the differences between predictions and the annotation of the real gene both in human and mouse.
The programs we are going to use are geneid, genscan
and fgenesh, which have been used in the previous practical
exercise. blast will be used to compare human and mouse
sequences. Finally, sgp2 (syntenic gene prediction tool) will
predict genes taking into account the homology found between these two
species.
|
A genomic DNA sequence |
We are going to work with this
Human sequence, which is stored in FASTA format. We also provide
the homologous region in the mouse genome in this
Mouse sequence.
|
Ab initio gene finding |
In the first approach, we will use all the ab initio tools
from the Gene Prediction section and compare the result of the three
programs. You could open a simple word processor and paste the results
of each gene-finding program in order to compare the coordinates of the
predicted exons.
|
In order to use genscan follow these steps:
|
In order to use fgenesh follow these steps:
Some questions:
Now, make the prediction in the Mouse sequence, with all the ab initio programs. Some questions:
Here you can find a plot with the predictions of the ab initio gene finding tools in the mouse genome. Do you find any common pattern between human and mouse prediction ? |
Comparing human and mouse sequences |
In order to use blastn follow these steps:
In order to use tblastx follow these steps:
Are all the predicted exons supported by conserved regions ? Here you can find a plot with the alignment results of the blastn and the tblastx alignments. There are other programs to align and visualize pairs of large genomic sequences: gff2aplot, Vista and Pipmaker. |
Using comparative gene finding tools |
In this section we will use sgp2 to make the predictions
using the conservation pattern between human and mouse.
Some questions:
Here you can find the human predictions, the mouse predictions and the human and mouse predictions with the tblastx similarity regions. There are other program that uses genomic comparison to improve gene prediction: twinscan and slam. |
Current annotations in the genomic DNA sequence |
Go to the UCSC genome browser , and look for the annotation of this region in the human genome. Open another web browser window and look for the annotation of the mouse sequence in the mouse genome annotation.
The predictions we have obtained, are they consistent with the annotation of
the UCSC genome browser ?
|