some problems of easyRNASeqâ : about the gtf files

Hu Fuyan [guest] guest at bioconductor.org
Tue Mar 19 05:27:52 CET 2013


I want to use easyRNASeq to get exon counts. But I found a strange thing:

I have two human annotation files from different sources: one(Homo_sapiens.GRCh37.70.gtf.gz
 ) is from ensemble ftp (ftp://ftp.ensembl.org/pub/release-70/gtf/homo_sapiens); the other(genes.gtf   ensembl) is from Illumina igenomes  (http://tophat.cbcb.umd.edu/igenomes.html).

The two annotation files are almost the same only with a small differentiation, such as the order of exons and attribute. 
When I run easyRNASeq, I used the two gtf files to check the result.

I have got different results for SLC25A13 exons


 -- output of sessionInfo(): 

Firstly,I got my bam file from tophat.

When I used Homo_sapiens.GRCh37.70.gtf as my annotation file in easyRNASeq, I got the result:



"\"ENSG00000004864\"_1" 2



"\"ENSG00000004864\"_2" 4



"\"ENSG00000004864\"_3" 16



"\"ENSG00000004864\"_4" 3



"\"ENSG00000004864\"_5" 7



"\"ENSG00000004864\"_6" 8



"\"ENSG00000004864\"_7" 5



"\"ENSG00000004864\"_8" 4



"\"ENSG00000004864\"_9" 4



"\"ENSG00000004864\"_10" 1



"\"ENSG00000004864\"_11" 6



"\"ENSG00000004864\"_12" 4



"\"ENSG00000004864\"_13" 4



"\"ENSG00000004864\"_14" 6



"\"ENSG00000004864\"_15" 8



"\"ENSG00000004864\"_16" 5



"\"ENSG00000004864\"_17" 3



"\"ENSG00000004864\"_18" 25



But when I used the gtf file from iIllumina igenomes, I got a wrong result (since we can view the bam form IGV):


"\"ENSG00000004864\"_18" 25

"\"ENSG00000004864\"_17" 13

"\"ENSG00000004864\"_2" 11

"\"ENSG00000004864\"_16" 3

"\"ENSG00000004864\"_1" 8

"\"ENSG00000004864\"_15" 5

"\"ENSG00000004864\"_14" 8

"\"ENSG00000004864\"_6" 6

"\"ENSG00000004864\"_13" 6

"\"ENSG00000004864\"_5" 0

"\"ENSG00000004864\"_3" 4

"\"ENSG00000004864\"_4" 4

"\"ENSG00000004864\"_12" 4

"\"ENSG00000004864\"_11" 4

"\"ENSG00000004864\"_10" 6

"\"ENSG00000004864\"_9" 1

"\"ENSG00000004864\"_8" 4

"\"ENSG00000004864\"_7" 4

 

I am so confused about the different result. 

Here are my main program using easyRNASeq:




count_gene_gtf_ensembl.table <- easyRNASeq(filesDirectory=getwd(),
filenames="accepted_hits.sorted.bam",
organism="Hsapiens",
 chr.sizes="auto",
annotationMethod="gtf",
annotationFile="/x400ifs-accel/ntteam/hufuyan/humanindex/Ensembl/ussd-ftp.illumina.com/Homo_sapiens/Ensembl/GRCh37/Homo_sapiens/Ensembl/GRCh37/Annotation/Archives/archive-2012-03-09-04-49-46/Genes/genes.gtf",
 format="bam",
gapped=TRUE, 
count="exon")



When I changed the order of exons of gene SLC25A13 in genes.gtf (illumina) according to Homo_sapiens.GRCh37.70.gtf., I run easyRNASeq again. Then I got the right exon counts.  

 

Another problem is that I got the warning:" You enforce UCSC chromosome conventions, however the provided annotation is not compliant. Correcting it." When I used the gtf files from UCSC, I also got this warning.
How can I fix it? 


--
Sent via the guest posting facility at bioconductor.org.



More information about the Bioconductor mailing list