[BioC] Can DESeq do this?

Wed Aug 6 20:19:54 CEST 2014

it's not that I don't recommend DESeq, it's that I don't recommend a
DE analysis for anything but exploration / hypothesis generation (we
say so much in the paragraph I referred to above) especially
considering that the sequencing facilities are different for the
different samples. This is referred to as a batch effect, and in this
case it is perfectly confounded with the condition of interest: the
site of sampling. Suppose, you were to take the same sample, and
prepare different libraries or perform sequencing at different
facilities in the US, Japan, etc., and then perform statistical
testing: you will often find many significant differences with such an
analysis. So your dataset is a mix of batch effects and biologically
interesting differences, which cannot be disentangled because they are
perfectly confounded.

Mike

On Wed, Aug 6, 2014 at 2:08 PM, Pet Chiang <sdpapet at gmail.com> wrote:
> Hi Michael,
>
> No, they were sequenced from different places. So, you don't recommend to
> use DESeq to do the analysis?
>
> Ben
>
>
> On Wed, Aug 6, 2014 at 12:03 PM, Michael Love <michaelisaiahlove at gmail.com>
> wrote:
>>
>> hi Ben,
>>
>> One problem with this comparison is that, without replicates from at
>> least one site, the statistical methods have no way of assessing the
>> biological and technical variability of the experiment. Just like with
>> a t-test, the question, is 2 < 6 depends on how much variability we
>> expect from sampling again and again. For more information, read the
>> paragraph "Experiments without replicates..." in ?DESeq.
>>
>> were the samples from the different sites prepared and sequenced at
>> the same facility?
>>
>> as far as technical aspects of using DESeq/edgeR for metagenomics,
>> Joey McMurdie has comprehensive instructions here:
>> http://joey711.github.io/phyloseq/
>>
>> Mike
>>
>> On Wed, Aug 6, 2014 at 1:42 PM, Pet Chiang <sdpapet at gmail.com> wrote:
>> >  I am working on my metagenomic data sets.
>> >
>> > I have annotated my metagenome against COG database. I would like to use
>> > DESeq to look for the overabundant genes in my site.
>> >
>> > Here is the problem, I only have one site (one metagenome). I would like
>> > to
>> > compare this one to different sites (each of these site has no
>> > replication
>> > too)
>> >
>> > the count data set looks like this:
>> >
>> >  function name                my site      site1 (from US)    site 2
>> > (from
>> > Japan)  site (from Iceland) .....
>> > COG1                                 2(counts)    6
>> > 9                         9
>> > COG2                                 5
>> > 5                        8                          8
>> > COG3                                 7
>> > 9                        8                             0
>> > .....
>> >
>> > I want to find if any of COG functions in my site is
>> > over-representative,
>> > which means the functional gene counts are overabundant across other
>> > sites.
>> >
>> > However, I am not sure  DESeq can do this or not?
>> >
>> > If it can do this, how can I set the groups.
>> >
>> > Best regards,
>> > Ben
>> >
>> >         [[alternative HTML version deleted]]
>> >
>> > _______________________________________________
>> > Bioconductor mailing list
>> > Bioconductor at r-project.org
>> > https://stat.ethz.ch/mailman/listinfo/bioconductor
>> > Search the archives:
>> > http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>