[BioC] goseq and transcript length data

Nadia Davidson nadia.davidson at mcri.edu.au
Thu Jun 6 09:22:10 CEST 2013



> I am trying to build the gene length database by myself, given that the 
> current version of goseq does not support the mm10 build.
> 
> 2, Then it comes to the transcript length issue, I noticed one of cufflink 
> output file genes.fpkm_tracking contains both the gene name and gene 
> length information. The length column has this format: 
> chr1:4807892-4846735. This is for Lypla1 gene. But this sequence range 
> include introns too. So I can not simply get the transcript length 
> by subtracting the second number by the first one. I went into every
> output file of cufflinks/cuffdiff and could not find a file containing the 
> transcript length information. Where can I get the transcript length 
> information? 
>
> 3, In my experiment, I only have 39 DE genes, do you think it is even worthy 
> for me to use goseq? Or should I simply go to DAVID?


Hi Tom,

If you use goseq 1.12 or later it should fetch the mm10 lengths. 
Which annotation are you using?

Getting the lengths from Cufflinks genes can be fiddly in my experience. 
You can do it by reading the annotation file into R and calculating 
intervals with GRanges.

39 genes is not many though. It probably wouldn't hurt to run the data 
through DAVID just to see if anything comes up. I've found DAVID 
pretty user friendly.

Cheers,
Nadia.



More information about the Bioconductor mailing list