[BioC] Problem elaborating contrasts for DESeq2 analysis

Michael Love michaelisaiahlove at gmail.com
Tue Jul 22 17:24:38 CEST 2014


hi Marine,

I heard about some stalled runs for v1.2 (but only heard about them
after the release cycle had ended). I haven't heard of any stalled
runs for the current release.

anyway, I believe the stalls were related to rows with very low counts,
but I'm not certain. You might try first subsetting: dds =
dds[rowMeans(counts(dds)) > 5,].  If this doesn't work you'll have to
upgrade to the release version.


On Tue, Jul 22, 2014 at 11:18 AM, Marine Rohmer
<Marine.Rohmer at mgx.cnrs.fr> wrote:
> Hi Michael,
>
> First of all thank you a lot for your answer!
> So you think that the "_vs_" names are only due to the older version ?
> On my workplace I can't update these packages but I will try to install them
> on my own computer in the latest version, and see if it goes better.
>
> Waiting for that, I've tried as you suggested :
>
> dds <-
> DESeqDataSetFromMatrix(countData=d2$counts,colData=descexp,design=~condition+0)
> dds <- DESeq(dds, betaPrior=FALSE)
>
> This second line turns but never ends (At least within 1 hour).
>
> Thank you for your attention,
>
> Best regards,
>
> Marine
>
>
> Le 2014-07-21 16:32, Michael Love a écrit :
>
>> hi Marine,
>>
>> Note that you are using an older version of DESeq2 and Bioconductor
>> (DESeq2 v1.4 was released in April 2014). So if you are searching by
>> Google for documentation, it might not apply to your software (we
>> print the version number on the first page of the vignettes). You can
>> always get the correct documentation by using:
>> browseVignettes("DESeq2") and ?results for the man page for the
>> results() function.
>>
>> If you can update to the release version of Bioconductor (by updating
>> to R 3.1.0), you can obtain the contrast you are interested in with:
>>
>> results(dds, contrast=list(c("conditionA1","conditionA2","conditionA3"),
>> c("conditionB1","conditionB2","conditionB3")), listValues=c(1/3,
>> -1/3))
>>
>> If you aren't able to update to R 3.1.0 and the release version of
>> Bioconductor, you can hack your contrast of interest for v1.2 like so:
>>
>> design(dds) <- ~ condition + 0
>> dds <- DESeq(dds, betaPrior=FALSE)
>> # note that the resultsNames(dds) will not be labelled correctly
>> # with design having + 0 (they still have the _vs_ in the names)
>> # but the following provides the correct contrast
>> attr(dds, "modelMatrix")
>> results(dds, contrast = c(1,1,1,-1,-1,-1,0,0,0) / 3)
>>
>> Mike
>>
>> On Mon, Jul 21, 2014 at 4:46 AM, Marine Rohmer
>> <Marine.Rohmer at mgx.cnrs.fr> wrote:
>>>
>>> Dear bioconductor mailing list,
>>>
>>> I want to do a differential analysis using the GLM with DESeq2.
>>> We have 18 RNA samples, including 6 samples subjected to treatment A, 6
>>> others subjected to treatment B, and the last 6 subjected to treatment C.
>>> Each treatment group is divided into 3 colonies, in which one we have 2
>>> replicates (leading us to 6 samples per treatment).
>>> Colonies A1, A2 and A3 have been treated with the A treatment, colonies
>>> B*
>>> with the B treatment, and colonies C* with the C treatment.
>>> (A picture of the experiment design is attached to this email).
>>>
>>> These treatments are what we want to study, being aware of the "colony
>>> effect".
>>>
>>> It is important to note that both factors "colony" and "treatment" are in
>>> linear correlation, so that the classic contrast "A - B" does not work
>>> (-->
>>> "design matrix not of full rank").
>>> That's why I want to study, for example, the contrast : (A1+A2+A3 -
>>> B1-B2-B3)/3 .
>>>
>>> I've followed the DESeq2 documentation, and it seems I can do that with
>>> the
>>> numeric contrast vector as described in the 3.2. section.
>>> My problem is that when typing "resultsNames(ddsCtrst)", I don't have
>>> "A1",
>>> "A2", "A3" etc... as described in the doc, but directly some "A2_vs_A1",
>>> "A2_vs_A1" etc... so that I can't do my numeric contrast vector as I
>>> wanted.
>>> I don't know why I have these "_vs_something" because I did not write
>>> them
>>> myself.
>>>
>>> This is my code :
>>>
>>> ### Preparing the analysis
>>> library(DESeq2)
>>> # Using edgeR to have the readDGE function
>>> library(edgeR)
>>>
>>> # fileT is my target.txt tabulated file, containing :
>>> # files group   description     colony
>>> # path/to/countsFile.txt        A       A1_1    A1
>>> # path/to/countsFile.txt        A       A1_2    A1
>>> # path/to/countsFile.txt        A       A2_1    A2
>>> # etc...
>>>
>>> targets <- read.delim(file = fileT, stringsAsFactors = FALSE,header =
>>> TRUE,
>>> comment.char="" )
>>> d2 <- readDGE(targets, columns=c(1,2), header = FALSE)
>>> d.RLE <- d2
>>> d.RLE <- calcNormFactors(d.RLE,method="RLE")
>>>
>>> ### Preparing the design and contrast
>>> descexp=c(d2$samples$colony)
>>> # "condition" is my colony factor "A1", "A2", "A3", "B1", "B2" etc...
>>> descexp=data.frame(condition=descexp)
>>>
>>>> descexp
>>>
>>>
>>>    condition
>>> 1         A1
>>> 2         A1
>>> 3         A2
>>> 4         A2
>>> 5         A3
>>> 6         A3
>>> 7         B1
>>> 8         B1
>>> 9         B2
>>> 10        B2
>>> 11        B3
>>> 12        B3
>>> 13        C1
>>> 14        C1
>>> 15        C2
>>> 16        C2
>>> 17        C3
>>> 18        C3
>>>
>>> dds <- DESeqDataSetFromMatrix(countData =
>>> d2$counts,colData=descexp,design=~condition)
>>>
>>> ### Checking the condition : each colony in two replicates
>>>>
>>>>
>>>> dds$condition
>>>
>>>
>>>  [1] A1 A1 A2 A2 A3 A3 B1 B1 B2 B2 B3 B3 C1 C1 C2 C2 C3 C3
>>> Levels: A1 A2 A3 B1 B2 B3 C1 C2 C3
>>>
>>> ### Analysis
>>> dds <- DESeq(dds)
>>>
>>> ### Now when typing resultsNames(dds) :
>>>>
>>>>
>>>> resultsNames(dds)
>>>
>>>
>>> [1] "Intercept"          "condition_A2_vs_A1" "condition_A3_vs_A1"
>>> [4] "condition_B1_vs_A1" "condition_B2_vs_A1" "condition_B3_vs_A1"
>>> [7] "condition_C1_vs_A1" "condition_C2_vs_A1" "condition_C3_vs_A1"
>>>
>>> ### Whereas I want something like described in the DESeq2 doc, that is to
>>> say without the "_vs_something"...
>>>
>>> Does anyone know what is wrong is my code (or my experiment design) ?
>>> I hope my message is understable. Don't hesitate to ask me other details.
>>>
>>> Thank you for advance for your attention,
>>>
>>> Best regards,
>>>
>>> M. R.
>>>
>>> NB :
>>>
>>>> sessionInfo()
>>>
>>>
>>> R version 3.0.2 (2013-09-25)
>>> Platform: x86_64-redhat-linux-gnu (64-bit)
>>>
>>> locale:
>>>  [1] LC_CTYPE=fr_FR.UTF-8       LC_NUMERIC=C
>>>  [3] LC_TIME=fr_FR.UTF-8        LC_COLLATE=fr_FR.UTF-8
>>>  [5] LC_MONETARY=fr_FR.UTF-8    LC_MESSAGES=fr_FR.UTF-8
>>>  [7] LC_PAPER=fr_FR.UTF-8       LC_NAME=C
>>>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
>>> [11] LC_MEASUREMENT=fr_FR.UTF-8 LC_IDENTIFICATION=C
>>>
>>> attached base packages:
>>> [1] parallel  stats     graphics  grDevices utils     datasets  methods
>>> [8] base
>>>
>>> other attached packages:
>>> [1] edgeR_3.4.0             limma_3.18.2            DESeq2_1.2.5
>>> [4] RcppArmadillo_0.3.920.1 Rcpp_0.10.6             GenomicRanges_1.14.3
>>> [7] XVector_0.2.0           IRanges_1.20.5          BiocGenerics_0.8.0
>>>
>>> loaded via a namespace (and not attached):
>>>  [1] annotate_1.40.0      AnnotationDbi_1.24.0 Biobase_2.22.0
>>>  [4] DBI_0.2-7            genefilter_1.44.0    grid_3.0.2
>>>  [7] lattice_0.20-24      locfit_1.5-9.1       RColorBrewer_1.0-5
>>> [10] RSQLite_0.11.4       splines_3.0.2        stats4_3.0.2
>>> [13] survival_2.37-4      tools_3.0.2          XML_3.98-1.1
>>> [16] xtable_1.7-1



More information about the Bioconductor mailing list