[BioC] goseq and transcript length data

Tom [guest] guest at bioconductor.org
Tue Jun 4 00:22:18 CEST 2013

Dear All,

Appreciate your time. Need your expertise. I am trying to use GOSeq for GO analysis of my RNA-seq experiments. 

I was using Tophat->Cufflinks for DE, and mouse mm10 for annotation.

I am trying to build the gene length database by myself, given that the current version of goseq does not support the mm10 build.

1, Cufflinks seems ignored the original gene identifier that comes with the mm10 and make its own, but they do keep the gene name in its record, so I will just take gene name as identifier in my process. I have already used the gene names for building the assayed gene vector and the DE gene vector was built too.

2, Then it comes to the transcript length issue, I noticed one of cufflink output file genes.fpkm_tracking contains both the gene name and gene length information. The length column has this format: chr1:4807892-4846735. This is for Lypla1 gene. But this sequence range include introns too. So I can not simply get the transcript length by subtracting the second number by the first one. I went into every output file of cufflinks/cuffdiff and could not find a file containing the transcript length information. Where can I get the transcript length information? 

3, In my experiment, I only have 39 DE genes, do you think it is even worthy for me to use goseq? Or should I simply go to DAVID?



 -- output of sessionInfo(): 


Sent via the guest posting facility at bioconductor.org.

More information about the Bioconductor mailing list