geneid documentation: 6. Introducing external information: annotations


Table of contents:


Description:
geneid allows to integrate ab initio predictions with external evidence (annotations), such as annotated genes. The external evidence is read from an additional file in gff format. geneid is instructed to read this file with the command line option -R filename. This input file must sorted by the starting position (column 4 in gff).

There is a difference between using options -O filename and -R filename. In the first case, only elements extracted from file will be assembled while, in the second one, gene predictions are built from both file records and from geneid predictions.

If the elements in the input file are assigned an score, then they will "compete" with geneid original predictions (if any) to be in the final gene structure. For instance, this record will fight against ab initio predictions:

AE002566   lab_XX  Internal    66255    66323    3.27  +  1

If no score is given for the element (a dot "."), then this element is supposed to be mandatory (forced) in the final gene prediction (unless a conflicting element with no score is also given in the input file). For instance, this record will be in the final prediction.

AE002566   lab_XX  Internal    66255    66323    .  +  1

The frame can be either set in the column 8 of gff format or skipped by using the wildcar "." when is unknown. In the last case, geneid generates 3 equivalent elements (one per possible frame), computing the corresponding remainder in each case, keeping the frame/remainder consistency anyway when assembling is done. For instance, this record is internally expanded to these 3 exons, being incorporated to the set of candidate exons to be part of final gene prediction:

AE002566   lab_XX  Internal    66255    66323    3.27  +  .

AE002566 lab_XX Internal 66255 66323 3.27 + 0 AE002566 lab_XX Internal 66255 66323 3.27 + 1 AE002566 lab_XX Internal 66255 66323 3.27 + 2

By using the optional group field (column 9 in gff format), user is able to specificy whether one annotated gene (annotation) introduced in the input file has to be preserved if it is incorporated in the final results or geneid predictions can be mixed within that annotation. For instance, given this annotation (1), this gene will be preserved in the final output. But, given this other annotation (2) for the same gene but without setting a group identifier, we can obtain predictions such as the following (3):

(1)
AE002566  external Terminal  21839    22922   18.37 -  1  gene_2
AE002566  external Internal  23679    24029    7.99 -  1  gene_2
AE002566  external First     30732    30775   -1.11 -  0  gene_2

(2) AE002566 external Terminal 21839 22922 18.37 - 1 AE002566 external Internal 23679 24029 7.99 - 1 AE002566 external First 30732 30775 -1.11 - 0
(3) AE002566 external Terminal 21839 22922 18.37 - 1 AE002566 external Internal 23679 24029 7.99 - 1 AE002566 geneid_v1.2 Internal 28002 28007 1.14 - 1 AE002566 external First 30732 30775 -1.11 - 0




Enrique Blanco Garcia © 2003