[BioC] RNA-seq differentially expressed gene finding methods

Paul Geeleher paulgeeleher at gmail.com
Fri Sep 5 19:05:06 CEST 2014


Hi Son,

My understanding is that the approach you describe could be considered
valid for large enough numbers of samples, however, RNA-seq
experiments will typically have smaller numbers (<30) samples per
condition, meaning that a t-test is not valid (because RNA-seq data
isn't normally distributed). However, while I don't think that a
t-test is "invalid" given enough samples, its very difficult to
justify using such a method when much better powered methods have been
invented specifically for this type of data.

Paul

On Fri, Sep 5, 2014 at 11:52 AM, Richard Friedman
<friedman at c2b2.columbia.edu> wrote:
> Dear Son,
>
>         The t-test assumes a normal distribution,
> which is appropriate for continous variables. RNAseq
> data deals with counts (discrete entities). A negative binomial distribution
> (EdgeR, Deseq) or a mean dependent variance (VOOM)
> is much more approriate. Also the 3 methods mentioned
> above estimate variablity better with information from all genes
> using empirical Bayesian methods, than does the one-gene
> at-a-time frequentist t-test.
>
> Best wishes,
> Rich
> Richard A. Friedman, PhD
> Associate Research Scientist,
> Biomedical Informatics Shared Resource
> Herbert Irving Comprehensive Cancer Center (HICCC)
> Lecturer,
> Department of Biomedical Informatics (DBMI)
> Educational Coordinator,
> Center for Computational Biology and Bioinformatics (C2B2)/
> National Center for Multiscale Analysis of Genomic Networks (MAGNet)/
> Columbia Department of Systems Biology
> Room 824
> Irving Cancer Research Center
> Columbia University
> 1130 St. Nicholas Ave
> New York, NY 10032
> (212)851-4765 (voice)
> friedman at c2b2.columbia.edu
> http://friedman.c2b2.columbia.edu/
>
> "There is nothing in my Contemporary Jewish Literature course that is
> either contemporary, Jewish, or literature".
>
> -Rose Friedman, age 17
>
>
> On Sep 5, 2014, at 12:44 PM, Son Pham wrote:
>
>> Dear all,
>> I know that we have quite very good packages (edgeR, deseq) that calculate
>> the list of differentially expressed genes in 2 conditions (with
>> replicates) from raw counts. But I do not know what is wrong with the
>> following simple approach (and whether other people have been using it):
>>
>> 1. Get the (estimated) tpm/fpkm for each gene in each sample
>> 2. Do a t-test for two groups on each gene.
>> 3. Adjust the p value for multiple tests (p-adj)
>>
>>
>> Thanks,
>>
>> Son.
>>
>>       [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



-- 
Dr. Paul Geeleher, PhD
Section of Hematology-Oncology
Department of Medicine
The University of Chicago
900 E. 57th St.,
KCBD, Room 7144
Chicago, IL 60637
--
www.bioinformaticstutorials.com



More information about the Bioconductor mailing list