[BioC] Question about CSAMA10 "Lab-8-RNAseqUseCase.pdf" tutorial on bioconductor website.

Mon Sep 27 16:08:49 CEST 2010

Okay great, thanks Steve, this works nicely. It does seem to take a
long (2 days!) time though but I guess this is to be expected! (I'm
assuming this is why they used the simple version in the tutorial...

Paul

On Thu, Sep 23, 2010 at 4:57 PM, Steve Lianoglou
<mailinglist.honeypot at gmail.com> wrote:
> Hi,
>
> On Thu, Sep 23, 2010 at 11:06 AM, Paul Geeleher <paulgeeleher at gmail.com> wrote:
>> Hi,
>>
>> I've been going through this RNA-seq use case
>> (http://bioconductor.org/help/course-materials/2010/CSAMA10/Lab-8-RNAseqUseCase.pdf)
>> with some data I have and I'm wondering about section 2.4 where they
>> calculate gene expression by counting the number of reads that alight
>> to within the boundaries of a genes, then normalize these based on the
>> length of the gene. Some of the code is as follows:
>>
>> dmGeneBounds <- CSAMA10::geneBounds(dmTxDb)
>> dmGeneBounds <- dmGeneBounds[seqnames(dmGeneBounds) %in%
>> levels(seqnames(alnRanges))]
>> head(dmGeneBounds, 3)
>> dmGeneCounts <- countOverlaps(dmGeneBounds, alnRanges)
>> dmRPKM <- CSAMA10::rpkm(dmGeneCounts, dmGeneBounds)
>>
>> My question is, is this actually correct, could you publish using this
>> method or is this just meant as a simple example?
>>
>> I'm interested in the ranks of the genes in the samples for a
>> subsequent analysis, but I would have assumed that you'd have to count
>> the number of reads that map to the EXONS of each gene and normalize
>> by the length of the EXONS, rather then the gene itself?
>>
>> If this is the case I wonder if there a tutorial that shows how to do that...
>
> If you get a bit comfortable with the GenomicFeatures package, you can
> do it pretty easily. Assume you have your reads stored in some GRanges
> object:
>
> txdb <- loadFeatures('/path/to/your/txdb')
> e <- exonsBy(txdb, 'gene')
> normedCounts <- lapply(e, function(x) {
>  countOverlaps(reads, x) / sum(width(x))
> })
>
> normedCounts now has the number of reads falling in each exon per gene
> divided by the total exon-length of the gene.
>
> You can run with that example to calculate a "proper" RPKM ...
>
> -steve
>
> --
> Steve Lianoglou
> Graduate Student: Computational Systems Biology
>  | Memorial Sloan-Kettering Cancer Center
>  | Weill Medical College of Cornell University
> Contact Info: http://cbio.mskcc.org/~lianos/contact
>

-- 
Paul Geeleher
School of Mathematics, Statistics and Applied Mathematics
National University of Ireland
Galway
Ireland
--
www.bioinformaticstutorials.com