[BioC] Question about CSAMA10 "Lab-8-RNAseqUseCase.pdf" tutorial on bioconductor website.

Thu Sep 23 17:06:28 CEST 2010

Hi,

I've been going through this RNA-seq use case
(http://bioconductor.org/help/course-materials/2010/CSAMA10/Lab-8-RNAseqUseCase.pdf)
with some data I have and I'm wondering about section 2.4 where they
calculate gene expression by counting the number of reads that alight
to within the boundaries of a genes, then normalize these based on the
length of the gene. Some of the code is as follows:

dmGeneBounds <- CSAMA10::geneBounds(dmTxDb)
dmGeneBounds <- dmGeneBounds[seqnames(dmGeneBounds) %in%
levels(seqnames(alnRanges))]
head(dmGeneBounds, 3)
dmGeneCounts <- countOverlaps(dmGeneBounds, alnRanges)
dmRPKM <- CSAMA10::rpkm(dmGeneCounts, dmGeneBounds)

My question is, is this actually correct, could you publish using this
method or is this just meant as a simple example?

I'm interested in the ranks of the genes in the samples for a
subsequent analysis, but I would have assumed that you'd have to count
the number of reads that map to the EXONS of each gene and normalize
by the length of the EXONS, rather then the gene itself?

If this is the case I wonder if there a tutorial that shows how to do that...

-- 
Paul Geeleher
School of Mathematics, Statistics and Applied Mathematics
National University of Ireland
Galway
Ireland
--
www.bioinformaticstutorials.com