SGP2 (http://genome.crg.es/software/index.php#SGP2) predictions combine geneid predictions with tblastx comparison of the Human genome (hg38) against the Mouse genome (mm10). SGP2 was run on the repeat-masked fasta sequences for the human genome version hg38. SGP2 also uses homology evidence (the SRs - similarity regions) from TBLASTX, in which a masked version of the human genome assembly was compared against the mouse assembly version mm10. Predictions were obtained per chromosome/fragment and output was generated in the following formats: chr1.sgp2 (geneid) chr1.gff (geneid predictions in gff2 format) chr1.gff3 (geneid predictions in gff3 format) chr1.gtf (geneid predictions in gtf format) chr1.cds (nucleotide sequence of predicted sequences) chr1.prot (amino acid sequence of predicted protein sequences) SGP predicted 36,013 protein coding genes on this version of the Human Genome.