[BioC] Rsubread vs. BWA, Bowtie, etc. and RPKM vs. normalized counts

Wei Shi shi at wehi.EDU.AU
Tue Oct 11 00:26:55 CEST 2011


Dear Tim,

Yes, Subread performs read mapping by mapping a set of 16 mers extracted
from each read to the genome, counting the number of mapped 16 mers at
each candidate location and then choosing the one which is mapped to by
the majority of the 16 mers as the mapping location of the read. The
fundamental difference between Subread and other aligners is that it uses
a voting method rather than a extension method to determine the mapping
locations of the reads, which makes it a lot faster and more sensitive.

Subread has a both a C version and an R version. The C version is freely
available from sourceforge and the R version is included in the Rsubread
package in Bioc.

The Rsubread package also includes a function called featureCounts, which
can be used to count the number of reads for each exon or gene. So this
function will be useful for you to look at the differential expression at
both gene level and exon level.

Another function which might be useful for your data analysis is the
subjunc funtion, which is designed to discover exon junctions. Subjunc
uses an idea similar to that of Subread. Our preliminary results showed
that subjunc outperformed competing junction detectors in terms of speed,
sensitivity and accuracy.

The devel version of Rsubread package includes a lot of our recent
development for both Subread aligner and Subjunc junction detector, so I
would recommend using the devel version if you want to try the Rsubread
package.


Hope this helps.

Cheers,
Wei





> A professor sent me a bunch of raw RNA-seq reads (as FASTQ files) and I
> want
> to align them, and I couldn't really make heads or tails of the options,
> so
> I listened to what Phil Green told me at a conference and looked around
> for
> a sensible word-nucleated aligner like he described.  It seems that
> Rsubread
> works this way?
>
> http://sourceforge.net/projects/subread/
>
> I would like BAM files as intermediate output, but my real interest is
> differential exon usage in differentiating cells.  Given that the reads I
> have to align are relatively short (36bp, SE), is there an advantage or
> disadvantage in using subread compared to other options?  And when I'm
> done
> trimming and aligning, I could choose raw counts, conditional quantile
> normalized counts, or something like RPKM to summarize how often a given
> exon seems to have been transcribed.  I read this:
>
> http://seqanswers.com/forums/showthread.php?t=586
>
> and I see that packages using a Gamma prior for the dispersion of a
> Poisson
> count model benefit from having raw counts.  If I am after correlated
> changes in exon usage depending on other sequence features, is it
> reasonable
> to use (say) 'cqn' on the raw counts, then log-transform and work with
> those
> normalized counts?
>
> Thanks for any suggestions,
>
> --
> Tim Triche, Jr.
> USC Biostatistics
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>



______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}



More information about the Bioconductor mailing list