Genefinding: running a program on your computer
Try the next steps:
- Connect to the GBL web.
- Select Software
- Select geneid and then, geneid homepage
- Take a look around the page to answer these questions...:
- If you have problems with the program, what can you do?
- Imagine you would like to see any example of geneid output before testing it, you should go to...
- You are going to download the program, what do you have to do?
- Let's download the program:
- Get the geneid v 1.1. Full distribution (save the file geneid_v1.1.Feb_26_2003.tar.gz)
- The file is compressed, try on your terminal:
tar -zxvf geneid_v1.1.Feb_26_2003.tar.gz
- Type cd geneid, and then make
- Type: bin/geneid -h
- Take a look at the list of options
- Save the sequence on your directory: HS307871.fa
- Run the gene prediction:
bin/geneid -P param/human3iso.param HS307871.fa
- Add the option -v and try to discover how it works
- Compare the prediction to the annotated gene
- Reannotation from experimental results:
From the first practice, you have observed a lack of accuracy when predicting the first exon of the gene (1107..1126).
- Can you verify whether geneid is actually building this exon or not by running geneid to predict exons? (hint: look for the option to do it)
- The solution for the previous step was:
bin/geneid -xGP param/human3iso.param HS307871.fa | grep 1107
- Let's imagine this exon has been experimentally tested and then, we'll try to rebuild the prediction with it. Take a look at this exon.
- Reannotation process. Type and analyze the current solution:
bin/geneid -P param/human3iso.param -R exon.gff HS307871.fa
- Advantages of working in a Unix like system
Using geneid in command line together with some Unix programs (grep, wc, gawk, sort, ...), we can easily parse geneid output to answer the following questions:
- A - How many putative exons does geneid predict on this sequence ?
- B - And how many putative acceptors sites ?
- C - Which is the start codon with the highest score between the coordinates 500 and 1500 ?
- D - How many putative exons does geneid predict that contain the GGGGG motive at aminoacid level?
- E - Which is the longest Single exon gene predicted by geneid ? And the shortest ?
## Solutions:
- A
bin/geneid -xoP param/human3iso.param HS307871.fa | grep -v "#" | wc3612
- B
bin/geneid -oaP param/human3iso.param HS307871.fa | grep -v "#" | wc
368
- C
bin/geneid -obP param/human3iso.param HS307871.fa | gawk '{if ($2>500 && $3<1550) print}' | sort +3nStart 998 1000 3.87 - CCAAGAGCGTCGCCATGTTG
- D
bin/geneid -oxP param/human3iso.param HS307871.fa | grep GGGGG | wc18
- E
bin/geneid -osP param/human3iso.param HS307871.fa | sort +11nSingle 2104 2163 -6.26 - 0 0 0.00 -1.75 -0.51 0.00 20
Single 827 1174 -7.24 + 0 0 -1.27 0.00 -3.70 0.00 116
Josep F. Abril, Enrique Blanco, Sergi Castellano, Genis Parra and Roderic Guigó © 2002