[BioC] [Bioc] RNAseq less sensitive than microarrays? Is it a statistical issue?

Lucia luciap at iscb.org
Thu May 16 19:25:24 CEST 2013


Thanks everyone for the helpful input

My counts are in fact total counts per transcript and not averages, my mistake in the original post

I have an average of 70 million read pairs of 100bp, with very even number of reads for all my samples

I am curious regarding how many clonal reads is too many though. I always see quite a bit of it on my samples. However, I have heard mixed advice regarding filtering duplicates in RNAseq, since with high coverage you expect some of the duplication to actually reflect abundance. In my samples filtering duplicates reduces the dynamic range dramatically and for highly expressed transcripts a high proportion of the reads that map to them have 2 or more copies

Thanks very much for all the help and advice

Lucia

Sent from my iPhone

On May 15, 2013, at 4:33 PM, Wolfgang Huber <whuber at embl.de> wrote:

> Dear Lucia
> 
> there are many reasons why you might see less differentially expressed genes from your RNA-Seq data. Firstly
> - the post doesn't say what the number of fragments per sample is ("50X pair end" is not a useful statement for RNA), and it might indeed just be too low.
> - the data quality of the microarray libraries might happen to be better than the RNA-Seq library in your experiment (e.g. do you see many PCR duplicates in the latter?)
> - the asymmetry in power to detect up vs down regulated genes might be a problem of unequal library sizes. 
> 
>> For the record my count matrices are of counts of transcripts, averaging
>> counts over all exons from the same gene model for all RefSeq genes.
> 
> I'm afraid this is not correct. The documentation of the DESeq and edgeR packages explicitly states [1] that you need to use the sum of the counts mapping to the gene (or transcript), not some average of per-exon values. It could well be that this is the major reason for your getting retarded results.
> 
> As far as I can see, differences such as you report have nothing to do with independent filtering or multiple testing adjustment. The former, if done correctly, will increase, not decrease the number of hits; the latter is independent of technology (i.e. the same as with microarrays).
> 
>    Hope this helps -
>    Wolfgang
> 
> [1] For the record: http://bioconductor.org/packages/release/bioc/vignettes/DESeq/inst/doc/DESeq.pdf --> Section 1.1 on page 2.
> 
> 
> On May 15, 2013, at 10:01 pm, Lucia Peixoto <luciap at iscb.org> wrote:
> 
>> Dear All,
>> 
>> I have a dataset for which I have two  conditions. I have 9 replicates per
>> group for microarrays, 5 per group for RNAseq (which are a subset of the
>> RNA samples used in the microarrays, couldn't sequence all 9), and 8 per
>> group for qPCR (which is an independent set of experiments).
>> Each n is an independent mouse, in and independent day from and independent
>> experiment, so that one experiment with yield n=1 for each of the  groups.
>> The correlation between control and treatments within the same day is not
>> better than across days, however.
>> 
>> Theoretically they all measure the same biological phenomenon, which is
>> gene expression changes, so I have been doing some comparisons between them
>> to try to get at the truth of what is really being differentially
>> expressed. In particular I have focused in the 5 samples in each of  the
>> three groups in which the only difference is whether the RNA was hybridized
>> by microarray or sequenced.
>> 
>> To my surprise the gene lists obtained from analyzing differential
>> expression using RNASeq (using either edgeR or DESeq) is considerably
>> smaller than the one obtained from microarray analysis (using locfdr on
>> pairwise t-statistics) at the same FDR. The RNASeq list is included in the
>> microarray list, but there are several differences I have validated by qPCR
>> that the RNASeq analysis is not able to detect at a reasonable FDR.
>> Moreover, there seems to be an unusual bias towards not being able to
>> detect down-regulated genes. I am a little bit puzzled by this, since one
>> of the reasons we are sequencing is that it is supposed to have a better
>> dynamic range.
>> 
>> These are the same RNA samples so this apparent lack of sensitivity has to
>> be related to either library prep or statistical analysis. So these are my
>> questions:
>> 
>> - can the inability to distinguish down-regulated genes be related to
>> filtering low count reads? (in order to get good separation between groups
>> in an MDS plot I need to filter cpm >0.1)
>> - Is it possible that I need more coverage to improve sensitivity? I am
>> currently sequencing at 50X pair end, that seemed enough. Is there any
>> published study looking at RNASeq sequencing depth and sensitivity in human
>> or mouse genomes?
>> - Are the multiple testing corrections applied in EdgeR and DESeq  too
>> stringent thus rendering the overall analysis less sensitive?
>> 
>> For the record my count matrices are of counts of transcripts, averaging
>> counts over all exons from the same gene model for all RefSeq genes. I did
>> this because the microarray data is per transcript. In log scale I have on
>> average 0.7 R2 correlation between microarray intensity and RPKM from the
>> same sample.
>> 
>> Thanks for the insight!
>> 
>> Lucia
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> -- 
>> Lucia Peixoto PhD
>> Postdoctoral Research Fellow
>> Laboratory of Dr. Ted Abel
>> Department of Biology
>> School of Arts and Sciences
>> University of Pennsylvania
>> 
>> "Think boldly, don't be afraid of making mistakes, don't miss small
>> details, keep your eyes open, and be modest in everything except your
>> aims."
>> Albert Szent-Gyorgyi
>> 
>>    [[alternative HTML version deleted]]
>> 
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
> 



More information about the Bioconductor mailing list