[BioC] DeSeq vs current version of Cuffdiff

Martin Morgan mtmorgan at fhcrc.org
Wed Feb 15 19:32:49 CET 2012


On 02/14/2012 06:02 AM, Stephen Turner wrote:
> I'm also in the same boat as Rich. I run a new bioinformatics core
> here and I'm building a pipeline for RNA-seq. Cufflinks for some time
> has supported biological replicates, and I'm also curious about the
> relative merits of using
> bowtie/tophat-cufflinks-cuffmerge-cuffdiff-cummeRbund versus us(ing
> tophat-HTSeq?-customScriptForCreatingMatrix?-DESeq. Cufflinks also

The cuffdiff route provides analysis of two-group experiments, whereas a 
counts table provides the raw material for richer experimental designs, 
clustering / classification, exploratory analysis, methods development, 
etc. The counts table route is more challenging in the initial stages of 
de novo discovery.

> gives me a host of other tests (differential splicing load,
> differential TSS usage, differential coding output, etc), which also
> seem useful for certain applications.
>
> On a related note, does anyone have a workflow for taking multiple bam
> files, running HTSeq-count (or another program), plus some other
> program or custom script, to produce a matrix of counts as input to
> DESeq?


In addition to Nico's suggestion, the work flow for 'counting' likely 
involves annotation input (via rtracklayer;:import, TxDb.* packages, 
GenomicFeatures::make*, biomaRt, etc); BAM input (via Rsamtools, 
GenomicRanges::readGappedAlignments, etc), and simple counting (via 
countOverlaps or GenomicRanges::summarizeExperiment). A simple(istic?) 
counter for unique hits to single end reads is

   library(GenomicRanges)
   ## read annotations anno into GRanges or GRangesList
   ## point to BAM files as (named) character vector
   counter <- function(file, anno) {
      ga <- readGappedAlignments(file)
      hits <- countOverlaps(ga, anno)
      countOverlaps(anno, ga[hits==1])
   }
   counts <- sapply(files, counter, annotations)

If rows of counts represent genes in a DESeq / edgeR analysis, then 
perhaps the 'hits' part of the counter can be omitted. Preceding this 
with library(parallel); sapply <- function(...) 
simplify2array(parallel::mclapply(...)) makes this run in parallel 
(though memory consumption might become an issue).

Martin

>
> Stephen
>
> -----------------------------------------
> Stephen D. Turner, Ph.D.
> bioinformatics at virginia.edu
> Bioinformatics Core Director
> University of Virginia School of Medicine
> bioinformatics.virginia.edu
>
> On Tue, Feb 14, 2012 at 6:00 AM,<bioconductor-request at r-project.org>  wrote:
>>
>> Message: 6
>> Date: Mon, 13 Feb 2012 09:28:10 -0800
>> From: "Tim Triche, Jr."<tim.triche at gmail.com>
>> To: Richard Friedman<friedman at cancercenter.columbia.edu>
>> Cc: Bioconductor mailing list<bioconductor at r-project.org>
>> Subject: Re: [BioC] DeSeq vs current version of Cuffdiff
>> Message-ID:
>>         <CAC+N9BWr30EvfZ5rH7hpNF-k9uQ=B4n_SqGr0B49XGmW89JBmg at mail.gmail.com>
>> Content-Type: text/plain
>>
>> Not directly relevant to gene-level RNA-seq DE calls, but rather for
>> exon-level DE,
>> I found it useful to read this:
>> http://precedings.nature.com/documents/6837/version/1
>> In particular, section 4.3 on page 11, and supplementary figures S7 and S8
>> on page 19.
>>
>> I was informed by a coworker that since everyone uses
>> BowTie-TopHat-Cufflinks-Cuffdiff, it is the sensible thing to do.
>> Conversations with people who know what they are doing (Terry Speed&
>> BCGSC) suggest the matter is not yet settled.
>> So I retrieved ~1TB of BAMs, extracted the reads, and started looking into
>> how that compares to DEXSeq and/or subread.
>>
>> It would be incredibly informative if the Cufflinks and DEXSeq authors had
>> time to weigh in on their strengths/weaknesses. DEXSeq&  cummeRbund both
>> offer nice tools for exploring the results; I am curious which pipeline
>> fits best for my needs.
>>
>> Thanks for bringing this up.
>>
>>
>> On Mon, Feb 13, 2012 at 8:01 AM, Richard Friedman<
>> friedman at cancercenter.columbia.edu>  wrote:
>>
>>> Dear Bioconductor list,
>>>
>>>         Sometime ago Simon Anders explained the difference
>>> between DeSeq and Cuffdiff as follows:
>>>
>>> "If you have two samples, cuffdiff tests, for each transcript, whether
>>> there is evidence that the concentration of this transcript is not the
>>> same in the two samples.
>>>
>>> If you have two different experimental conditions, with replicates for
>>> each condition, DESeq tests, whether, for a given gene, the change in
>>> expression strength between the two conditions is large as compared to
>>> the variation within each replicate group."
>>>
>>> Current language on the Cuffdiff site suggests that the current version
>>> of that program  tests for whether the change is significant compared to
>>> changes in each condition.
>>>
>>> http://cufflinks.cbcb.umd.edu/**howitworks.html#hdif<http://cufflinks.cbcb.umd.edu/howitworks.html#hdif>
>>>
>>> http://cufflinks.cbcb.umd.edu/**howitworks.html#reps<http://cufflinks.cbcb.umd.edu/howitworks.html#reps>
>>>
>>> Can someone please comment on the relative merits of Cuffdiff and
>>> DeSeq. I ask here because our sequencing core delivers results
>>> based on Cuffdiff and I want to know if I should redo it using
>>> DeSeq,I would greatly appreciate any guidance in this matter.
>>>
>>> Thanks and best wishes,
>>> Rich
>>> ------------------------------**------------------------------
>>> Richard A. Friedman, PhD
>>> Associate Research Scientist,
>>> Biomedical Informatics Shared Resource
>>> Herbert Irving Comprehensive Cancer Center (HICCC)
>>> Lecturer,
>>> Department of Biomedical Informatics (DBMI)
>>> Educational Coordinator,
>>> Center for Computational Biology and Bioinformatics (C2B2)/
>>> National Center for Multiscale Analysis of Genomic Networks (MAGNet)
>>> Room 824
>>> Irving Cancer Research Center
>>> Columbia University
>>> 1130 St. Nicholas Ave
>>> New York, NY 10032
>>> (212)851-4765 (voice)
>>> friedman at cancercenter.**columbia.edu<friedman at cancercenter.columbia.edu>
>>> http://cancercenter.columbia.**edu/~friedman/<http://cancercenter.columbia.edu/~friedman/>
>>>
>>> I am a Bayesian. When I see a multiple-choice question on a test and I
>>> don't
>>> know the answer I say "eeney-meaney-miney-moe".
>>>
>>> Rose Friedman, Age 14
>>>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor


-- 
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109

Location: M1-B861
Telephone: 206 667-2793



More information about the Bioconductor mailing list