[BioC] Problem elaborating contrasts for DESeq2 analysis

Marine Rohmer Marine.Rohmer at mgx.cnrs.fr
Tue Jul 22 17:18:20 CEST 2014


Hi Michael,

First of all thank you a lot for your answer!
So you think that the "_vs_" names are only due to the older version ?
On my workplace I can't update these packages but I will try to install 
them on my own computer in the latest version, and see if it goes 
better.

Waiting for that, I've tried as you suggested :

dds <- 
DESeqDataSetFromMatrix(countData=d2$counts,colData=descexp,design=~condition+0)
dds <- DESeq(dds, betaPrior=FALSE)

This second line turns but never ends (At least within 1 hour).

Thank you for your attention,

Best regards,

Marine


Le 2014-07-21 16:32, Michael Love a écrit :
> hi Marine,
> 
> Note that you are using an older version of DESeq2 and Bioconductor
> (DESeq2 v1.4 was released in April 2014). So if you are searching by
> Google for documentation, it might not apply to your software (we
> print the version number on the first page of the vignettes). You can
> always get the correct documentation by using:
> browseVignettes("DESeq2") and ?results for the man page for the
> results() function.
> 
> If you can update to the release version of Bioconductor (by updating
> to R 3.1.0), you can obtain the contrast you are interested in with:
> 
> results(dds, 
> contrast=list(c("conditionA1","conditionA2","conditionA3"),
> c("conditionB1","conditionB2","conditionB3")), listValues=c(1/3,
> -1/3))
> 
> If you aren't able to update to R 3.1.0 and the release version of
> Bioconductor, you can hack your contrast of interest for v1.2 like so:
> 
> design(dds) <- ~ condition + 0
> dds <- DESeq(dds, betaPrior=FALSE)
> # note that the resultsNames(dds) will not be labelled correctly
> # with design having + 0 (they still have the _vs_ in the names)
> # but the following provides the correct contrast
> attr(dds, "modelMatrix")
> results(dds, contrast = c(1,1,1,-1,-1,-1,0,0,0) / 3)
> 
> Mike
> 
> On Mon, Jul 21, 2014 at 4:46 AM, Marine Rohmer
> <Marine.Rohmer at mgx.cnrs.fr> wrote:
>> Dear bioconductor mailing list,
>> 
>> I want to do a differential analysis using the GLM with DESeq2.
>> We have 18 RNA samples, including 6 samples subjected to treatment A, 
>> 6
>> others subjected to treatment B, and the last 6 subjected to treatment 
>> C.
>> Each treatment group is divided into 3 colonies, in which one we have 
>> 2
>> replicates (leading us to 6 samples per treatment).
>> Colonies A1, A2 and A3 have been treated with the A treatment, 
>> colonies B*
>> with the B treatment, and colonies C* with the C treatment.
>> (A picture of the experiment design is attached to this email).
>> 
>> These treatments are what we want to study, being aware of the "colony
>> effect".
>> 
>> It is important to note that both factors "colony" and "treatment" are 
>> in
>> linear correlation, so that the classic contrast "A - B" does not work 
>> (-->
>> "design matrix not of full rank").
>> That's why I want to study, for example, the contrast : (A1+A2+A3 -
>> B1-B2-B3)/3 .
>> 
>> I've followed the DESeq2 documentation, and it seems I can do that 
>> with the
>> numeric contrast vector as described in the 3.2. section.
>> My problem is that when typing "resultsNames(ddsCtrst)", I don't have 
>> "A1",
>> "A2", "A3" etc... as described in the doc, but directly some 
>> "A2_vs_A1",
>> "A2_vs_A1" etc... so that I can't do my numeric contrast vector as I 
>> wanted.
>> I don't know why I have these "_vs_something" because I did not write 
>> them
>> myself.
>> 
>> This is my code :
>> 
>> ### Preparing the analysis
>> library(DESeq2)
>> # Using edgeR to have the readDGE function
>> library(edgeR)
>> 
>> # fileT is my target.txt tabulated file, containing :
>> # files group   description     colony
>> # path/to/countsFile.txt        A       A1_1    A1
>> # path/to/countsFile.txt        A       A1_2    A1
>> # path/to/countsFile.txt        A       A2_1    A2
>> # etc...
>> 
>> targets <- read.delim(file = fileT, stringsAsFactors = FALSE,header = 
>> TRUE,
>> comment.char="" )
>> d2 <- readDGE(targets, columns=c(1,2), header = FALSE)
>> d.RLE <- d2
>> d.RLE <- calcNormFactors(d.RLE,method="RLE")
>> 
>> ### Preparing the design and contrast
>> descexp=c(d2$samples$colony)
>> # "condition" is my colony factor "A1", "A2", "A3", "B1", "B2" etc...
>> descexp=data.frame(condition=descexp)
>> 
>>> descexp
>> 
>>    condition
>> 1         A1
>> 2         A1
>> 3         A2
>> 4         A2
>> 5         A3
>> 6         A3
>> 7         B1
>> 8         B1
>> 9         B2
>> 10        B2
>> 11        B3
>> 12        B3
>> 13        C1
>> 14        C1
>> 15        C2
>> 16        C2
>> 17        C3
>> 18        C3
>> 
>> dds <- DESeqDataSetFromMatrix(countData =
>> d2$counts,colData=descexp,design=~condition)
>> 
>> ### Checking the condition : each colony in two replicates
>>> 
>>> dds$condition
>> 
>>  [1] A1 A1 A2 A2 A3 A3 B1 B1 B2 B2 B3 B3 C1 C1 C2 C2 C3 C3
>> Levels: A1 A2 A3 B1 B2 B3 C1 C2 C3
>> 
>> ### Analysis
>> dds <- DESeq(dds)
>> 
>> ### Now when typing resultsNames(dds) :
>>> 
>>> resultsNames(dds)
>> 
>> [1] "Intercept"          "condition_A2_vs_A1" "condition_A3_vs_A1"
>> [4] "condition_B1_vs_A1" "condition_B2_vs_A1" "condition_B3_vs_A1"
>> [7] "condition_C1_vs_A1" "condition_C2_vs_A1" "condition_C3_vs_A1"
>> 
>> ### Whereas I want something like described in the DESeq2 doc, that is 
>> to
>> say without the "_vs_something"...
>> 
>> Does anyone know what is wrong is my code (or my experiment design) ?
>> I hope my message is understable. Don't hesitate to ask me other 
>> details.
>> 
>> Thank you for advance for your attention,
>> 
>> Best regards,
>> 
>> M. R.
>> 
>> NB :
>> 
>>> sessionInfo()
>> 
>> R version 3.0.2 (2013-09-25)
>> Platform: x86_64-redhat-linux-gnu (64-bit)
>> 
>> locale:
>>  [1] LC_CTYPE=fr_FR.UTF-8       LC_NUMERIC=C
>>  [3] LC_TIME=fr_FR.UTF-8        LC_COLLATE=fr_FR.UTF-8
>>  [5] LC_MONETARY=fr_FR.UTF-8    LC_MESSAGES=fr_FR.UTF-8
>>  [7] LC_PAPER=fr_FR.UTF-8       LC_NAME=C
>>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=fr_FR.UTF-8 LC_IDENTIFICATION=C
>> 
>> attached base packages:
>> [1] parallel  stats     graphics  grDevices utils     datasets  
>> methods
>> [8] base
>> 
>> other attached packages:
>> [1] edgeR_3.4.0             limma_3.18.2            DESeq2_1.2.5
>> [4] RcppArmadillo_0.3.920.1 Rcpp_0.10.6             
>> GenomicRanges_1.14.3
>> [7] XVector_0.2.0           IRanges_1.20.5          BiocGenerics_0.8.0
>> 
>> loaded via a namespace (and not attached):
>>  [1] annotate_1.40.0      AnnotationDbi_1.24.0 Biobase_2.22.0
>>  [4] DBI_0.2-7            genefilter_1.44.0    grid_3.0.2
>>  [7] lattice_0.20-24      locfit_1.5-9.1       RColorBrewer_1.0-5
>> [10] RSQLite_0.11.4       splines_3.0.2        stats4_3.0.2
>> [13] survival_2.37-4      tools_3.0.2          XML_3.98-1.1
>> [16] xtable_1.7-1



More information about the Bioconductor mailing list