[BioC] Problem elaborating contrasts for DESeq2 analysis

Michael Love michaelisaiahlove at gmail.com
Mon Jul 21 16:32:48 CEST 2014


hi Marine,

Note that you are using an older version of DESeq2 and Bioconductor
(DESeq2 v1.4 was released in April 2014). So if you are searching by
Google for documentation, it might not apply to your software (we
print the version number on the first page of the vignettes). You can
always get the correct documentation by using:
browseVignettes("DESeq2") and ?results for the man page for the
results() function.

If you can update to the release version of Bioconductor (by updating
to R 3.1.0), you can obtain the contrast you are interested in with:

results(dds, contrast=list(c("conditionA1","conditionA2","conditionA3"),
c("conditionB1","conditionB2","conditionB3")), listValues=c(1/3,
-1/3))

If you aren't able to update to R 3.1.0 and the release version of
Bioconductor, you can hack your contrast of interest for v1.2 like so:

design(dds) <- ~ condition + 0
dds <- DESeq(dds, betaPrior=FALSE)
# note that the resultsNames(dds) will not be labelled correctly
# with design having + 0 (they still have the _vs_ in the names)
# but the following provides the correct contrast
attr(dds, "modelMatrix")
results(dds, contrast = c(1,1,1,-1,-1,-1,0,0,0) / 3)

Mike

On Mon, Jul 21, 2014 at 4:46 AM, Marine Rohmer
<Marine.Rohmer at mgx.cnrs.fr> wrote:
> Dear bioconductor mailing list,
>
> I want to do a differential analysis using the GLM with DESeq2.
> We have 18 RNA samples, including 6 samples subjected to treatment A, 6
> others subjected to treatment B, and the last 6 subjected to treatment C.
> Each treatment group is divided into 3 colonies, in which one we have 2
> replicates (leading us to 6 samples per treatment).
> Colonies A1, A2 and A3 have been treated with the A treatment, colonies B*
> with the B treatment, and colonies C* with the C treatment.
> (A picture of the experiment design is attached to this email).
>
> These treatments are what we want to study, being aware of the "colony
> effect".
>
> It is important to note that both factors "colony" and "treatment" are in
> linear correlation, so that the classic contrast "A - B" does not work (-->
> "design matrix not of full rank").
> That's why I want to study, for example, the contrast : (A1+A2+A3 -
> B1-B2-B3)/3 .
>
> I've followed the DESeq2 documentation, and it seems I can do that with the
> numeric contrast vector as described in the 3.2. section.
> My problem is that when typing "resultsNames(ddsCtrst)", I don't have "A1",
> "A2", "A3" etc... as described in the doc, but directly some "A2_vs_A1",
> "A2_vs_A1" etc... so that I can't do my numeric contrast vector as I wanted.
> I don't know why I have these "_vs_something" because I did not write them
> myself.
>
> This is my code :
>
> ### Preparing the analysis
> library(DESeq2)
> # Using edgeR to have the readDGE function
> library(edgeR)
>
> # fileT is my target.txt tabulated file, containing :
> # files group   description     colony
> # path/to/countsFile.txt        A       A1_1    A1
> # path/to/countsFile.txt        A       A1_2    A1
> # path/to/countsFile.txt        A       A2_1    A2
> # etc...
>
> targets <- read.delim(file = fileT, stringsAsFactors = FALSE,header = TRUE,
> comment.char="" )
> d2 <- readDGE(targets, columns=c(1,2), header = FALSE)
> d.RLE <- d2
> d.RLE <- calcNormFactors(d.RLE,method="RLE")
>
> ### Preparing the design and contrast
> descexp=c(d2$samples$colony)
> # "condition" is my colony factor "A1", "A2", "A3", "B1", "B2" etc...
> descexp=data.frame(condition=descexp)
>
>> descexp
>
>    condition
> 1         A1
> 2         A1
> 3         A2
> 4         A2
> 5         A3
> 6         A3
> 7         B1
> 8         B1
> 9         B2
> 10        B2
> 11        B3
> 12        B3
> 13        C1
> 14        C1
> 15        C2
> 16        C2
> 17        C3
> 18        C3
>
> dds <- DESeqDataSetFromMatrix(countData =
> d2$counts,colData=descexp,design=~condition)
>
> ### Checking the condition : each colony in two replicates
>>
>> dds$condition
>
>  [1] A1 A1 A2 A2 A3 A3 B1 B1 B2 B2 B3 B3 C1 C1 C2 C2 C3 C3
> Levels: A1 A2 A3 B1 B2 B3 C1 C2 C3
>
> ### Analysis
> dds <- DESeq(dds)
>
> ### Now when typing resultsNames(dds) :
>>
>> resultsNames(dds)
>
> [1] "Intercept"          "condition_A2_vs_A1" "condition_A3_vs_A1"
> [4] "condition_B1_vs_A1" "condition_B2_vs_A1" "condition_B3_vs_A1"
> [7] "condition_C1_vs_A1" "condition_C2_vs_A1" "condition_C3_vs_A1"
>
> ### Whereas I want something like described in the DESeq2 doc, that is to
> say without the "_vs_something"...
>
> Does anyone know what is wrong is my code (or my experiment design) ?
> I hope my message is understable. Don't hesitate to ask me other details.
>
> Thank you for advance for your attention,
>
> Best regards,
>
> M. R.
>
> NB :
>
>> sessionInfo()
>
> R version 3.0.2 (2013-09-25)
> Platform: x86_64-redhat-linux-gnu (64-bit)
>
> locale:
>  [1] LC_CTYPE=fr_FR.UTF-8       LC_NUMERIC=C
>  [3] LC_TIME=fr_FR.UTF-8        LC_COLLATE=fr_FR.UTF-8
>  [5] LC_MONETARY=fr_FR.UTF-8    LC_MESSAGES=fr_FR.UTF-8
>  [7] LC_PAPER=fr_FR.UTF-8       LC_NAME=C
>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=fr_FR.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] parallel  stats     graphics  grDevices utils     datasets  methods
> [8] base
>
> other attached packages:
> [1] edgeR_3.4.0             limma_3.18.2            DESeq2_1.2.5
> [4] RcppArmadillo_0.3.920.1 Rcpp_0.10.6             GenomicRanges_1.14.3
> [7] XVector_0.2.0           IRanges_1.20.5          BiocGenerics_0.8.0
>
> loaded via a namespace (and not attached):
>  [1] annotate_1.40.0      AnnotationDbi_1.24.0 Biobase_2.22.0
>  [4] DBI_0.2-7            genefilter_1.44.0    grid_3.0.2
>  [7] lattice_0.20-24      locfit_1.5-9.1       RColorBrewer_1.0-5
> [10] RSQLite_0.11.4       splines_3.0.2        stats4_3.0.2
> [13] survival_2.37-4      tools_3.0.2          XML_3.98-1.1
> [16] xtable_1.7-1



More information about the Bioconductor mailing list