[BioC] Question about CSAMA10 "Lab-8-RNAseqUseCase.pdf" tutorial on bioconductor website.

Steve Lianoglou mailinglist.honeypot at gmail.com
Thu Sep 23 17:57:09 CEST 2010


Hi,

On Thu, Sep 23, 2010 at 11:06 AM, Paul Geeleher <paulgeeleher at gmail.com> wrote:
> Hi,
>
> I've been going through this RNA-seq use case
> (http://bioconductor.org/help/course-materials/2010/CSAMA10/Lab-8-RNAseqUseCase.pdf)
> with some data I have and I'm wondering about section 2.4 where they
> calculate gene expression by counting the number of reads that alight
> to within the boundaries of a genes, then normalize these based on the
> length of the gene. Some of the code is as follows:
>
> dmGeneBounds <- CSAMA10::geneBounds(dmTxDb)
> dmGeneBounds <- dmGeneBounds[seqnames(dmGeneBounds) %in%
> levels(seqnames(alnRanges))]
> head(dmGeneBounds, 3)
> dmGeneCounts <- countOverlaps(dmGeneBounds, alnRanges)
> dmRPKM <- CSAMA10::rpkm(dmGeneCounts, dmGeneBounds)
>
> My question is, is this actually correct, could you publish using this
> method or is this just meant as a simple example?
>
> I'm interested in the ranks of the genes in the samples for a
> subsequent analysis, but I would have assumed that you'd have to count
> the number of reads that map to the EXONS of each gene and normalize
> by the length of the EXONS, rather then the gene itself?
>
> If this is the case I wonder if there a tutorial that shows how to do that...

If you get a bit comfortable with the GenomicFeatures package, you can
do it pretty easily. Assume you have your reads stored in some GRanges
object:

txdb <- loadFeatures('/path/to/your/txdb')
e <- exonsBy(txdb, 'gene')
normedCounts <- lapply(e, function(x) {
  countOverlaps(reads, x) / sum(width(x))
})

normedCounts now has the number of reads falling in each exon per gene
divided by the total exon-length of the gene.

You can run with that example to calculate a "proper" RPKM ...

-steve

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact



More information about the Bioconductor mailing list