[BioC] DESeq2 extracting results in version >= 1.3 (was: "using DESeq2 with multi factor data")

Karen Chait kchait at techunix.technion.ac.il
Mon Mar 31 08:42:27 CEST 2014

Hi Michael,
When I define 2 factors when each has only 2 levels I understand I can't use the expanded matrix. I want to perform the following analysis-
> colData=data.frame(row.names=colnames(tmpcountData),metas=metasVector,time=timeVector)
> colData\$metas=factor(colData\$metas,levels=c("met_no","met_yes"))
> colData\$time=factor(colData\$time,levels=c("1","2"))
> dds2=DESeqDataSetFromMatrix(countData=tmpcountData,colData=colData,design=~time+metas+metas:time)
> dds2=DESeq(dds2)
> resultsNames(dds2)
[1] "Intercept"               "time_2_vs_1"             "metas_met_yes_vs_met_no" "time2.metasmet_yes"

The comparisons I want to run are :
1. all samples met_yes vs. met_no
2. Samples with time=1 met_yes vs. met_no
3. Samples with time=2 met_yes vs. met_no

I get the same results for comparisons 1 and 2
> res1=results(dds,contrast=c("metas","met_yes","met_no"),alpha=0.05)
> res2=results(dds,contrast=c(0,0,1,0),alpha=0.05)
> res3=results(dds,contrast=c(0,0,1,1),alpha=0.05)

I understand that for the first comparison I can define the design without the time factor.
For the second comparison, does DESeq calculate the differential expression only between the samples with time=1?
If not, can you explain how DESeq2 calculates the DE for these comparisons.

> sessionInfo()
R version 3.0.3 (2014-03-06)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
[1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8       LC_NAME=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods
[8] base

other attached packages:
[3] Rcpp_0.11.1               GenomicRanges_1.14.4
[5] XVector_0.2.0             IRanges_1.20.7
[7] BiocGenerics_0.8.0

loaded via a namespace (and not attached):
[1] annotate_1.40.1      AnnotationDbi_1.24.0 Biobase_2.22.0
[4] DBI_0.2-7            genefilter_1.44.0    geneplotter_1.40.0
[7] grid_3.0.3           lattice_0.20-27      locfit_1.5-9.1
[10] RColorBrewer_1.0-5   RSQLite_0.11.4       splines_3.0.3
[13] stats4_3.0.3         survival_2.37-7      tools_3.0.3
[16] XML_3.98-1.1         xtable_1.7-3

Thank you so much,
Karen and Olga

-----Original Message-----
From: Michael Love [mailto:michaelisaiahlove at gmail.com]
Sent: Monday, March 24, 2014 5:24 PM
To: solgakar at bi.technion.ac.il; karen chait (kchait at tx.technion.ac.il)
Cc: bioconductor at r-project.org
Subject: Re: DESeq2 extracting results in version >= 1.3 (was: "using DESeq2 with multi factor data")

hi Olga and Karen,

I updated the results() function in DESeq2 version 1.3 to simplify extracting the kind of contrast we were discussing, combining main effects and interactions. While I emailed the following code to build a numeric contrast,

> ctrst <- numeric(21)
> rn <- resultsNames(dds)
> ctrst[ rn == "metasmet_no" ] <- -1
> ctrst[ rn == "metasmet_yes" ] <- 1
> ctrst[ rn == "timetime_2.metasmet_no" ] <- -1 ctrst[ rn ==
> "timetime_2.metasmet_yes" ] <- 1 results(dds, contrast=ctrst)

this can now be accomplished by giving a list to the 'contrast'
argument. The first element of the list should be the effects for the numerator of the log fold change, and the second element should be the effects for the denominator:

results( dds, contrast = list(
c("metasmet_yes", "timetime_2.metasmet_yes"),
c("metasmet_no", "timetime_2.metasmet_no") ) )

There are more details and examples in the man page for ?results for the development branch.

Mike

On Mon, Mar 17, 2014 at 9:33 AM, Michael Love <michaelisaiahlove at gmail.com> wrote:
>
> hi Olga and Karen,
>
> Thanks for your interest in DESeq2 and for using the development
> branch, which allows us to clarify changes before they enter the
> release branch.
>
> This concerns a modeling change from v1.2 to v1.3. Changes like these
> are documented in the NEWS file with pointers to the
> functions/arguments that changed:
> (e.g. http://www.bioconductor.org/packages/2.14/bioc/news/DESeq2/NEWS)
> I've been working on including more documentation in the package, but
> haven't gotten to adding the parts about interaction terms yet, so
> this gives me a chance to explain a bit.
>
> The new resultsNames(dds) are due to "expanded model matrices", which
> include a column in the design matrix for each level of a factor, as
> opposed to "standard model matrices" produced by model.matrix, which
> have a column in the deisgn matrix for the levels other than the base
> level. Expanded model matrices are described a bit in the man pages
> and in the vignette, but not yet using an example with interaction
> effects. This makes the DE analysis independent of the choice to base
> level, whereas previously the log fold change (LFC) shrinkage was not
> independent of base level choices. In addition, it simplifies certain
> comparisons.
>
> You now have the columns "metasmet_no" and "metasmet_yes", whereas
> before you had "metas_met_yes_vs_met_no".  Previously you would use
> the 'name' argument to extract this comparison, whereas with version
> >= 1.3 we want to encourage users to use the 'contrast' argument:
>
> results(dds, contrast=c("metas","met_yes","met_no"))
>
> If the analysis includes interaction terms, previously the above
> contrast would give the metastasis effect only for the base level of
> the other factors in the design, but for version >= 1.3, this gives
> the overall metastasis effect over all levels of the time variable.
>
> If you want the individual time-period-specific effects, this is the
> overall effect plus the interaction effects for that time period.You
> have the following columns of the model matrix:
>
> > [1] "Intercept"               "timetime_1"              "timetime_2"              "timetime_3"              "timetime_4"              "timetime_5"
> >  [7] "timetime_6"              "metasmet_no"             "metasmet_yes"            "timetime_1.metasmet_no"  "timetime_2.metasmet_no"  "timetime_3.metasmet_no"
> > [13] "timetime_4.metasmet_no"  "timetime_5.metasmet_no"  "timetime_6.metasmet_no"  "timetime_1.metasmet_yes" "timetime_2.metasmet_yes" "timetime_3.metasmet_yes"
> > [19] "timetime_4.metasmet_yes" "timetime_5.metasmet_yes" "timetime_6.metasmet_yes"
>
> So the metastasis effect specific to time period 2 would be, the
> contrast of "metasmet_yes" vs "metasmet_no" and additionally, the
> contrast of "timetime_2.metasmet_yes" vs "timetime_2.metasmet_no". We
> can pull these out using the numeric contrast vector, starting with
> all 0's and then filing in the -1's and 1's:
>
> ctrst <- numeric(21)
> rn <- resultsNames(dds)
> ctrst[ rn == "metasmet_no" ] <- -1
> ctrst[ rn == "metasmet_yes" ] <- 1
> ctrst[ rn == "timetime_2.metasmet_no" ] <- -1 ctrst[ rn ==
> "timetime_2.metasmet_yes" ] <- 1 results(dds, contrast=ctrst)
>
>
> Mike
>
>
>
> On Mon, Mar 17, 2014 at 5:07 AM, solgakar at bi.technion.ac.il
> <solgakar at bi.technion.ac.il> wrote:
> >
> >
> >
> > From: Karen Chait [mailto:kchait at tx.technion.ac.il]
> > Sent: Monday, March 17, 2014 10:57 AM
> > To: Olga Karinsky
> > Subject: RE: using DESeq2 with multi factor data
> >
> > Hello all,
> > I am trying to use the DESeq2 package to perform RNA-Seq analysis on a data containing several factors.
> > I have been closely following the emails between Ming Yi and Michael Love, because I think that my problem is very similar to what they have discussed.  But even though I received a lot of useful information from their discussion, I still have several questions regarding my specific data.
> > Just as an overall information regarding my data, I have 96 samples and the two factors I am interested in exploring are "time" and "metastasis".
> > In order to build my data set I used the following commands:
> >
> > > row.names=1)
> >
> > >
> > > metasVector=c("met_no","met_no","met_no","met_no","met_no","met_no
> > > ","met_no","met_no","met_no","met_no","met_no","met_no","met_no","
> > > met_no","met_no","met_no","met_no","met_no","met_no","met_no","met
> > > _no","met_no","met_no","met_no","met_no","met_no","met_no","met_no
> > > ","met_no","met_no","met_no","met_no","met_no","met_no","met_no","
> > > met_no","met_no","met_no","met_no","met_no","met_no","met_no","met
> > > _no","met_no","met_no","met_no","met_no","met_no","met_no","met_no
> > > ","met_no","met_no","met_no","met_no","met_no","met_no","met_no","
> > > met_no","met_no","met_no","met_no","met_no","met_no","met_no","met
> > > _no","met_no","met_yes","met_yes","met_yes","met_yes","met_yes","m
> > > et_yes","met_yes","met_yes","met_yes","met_yes","met_yes","met_yes
> > > ","met_yes","met_yes","met_yes","met_yes","met_yes","met_yes","met
> > > _yes","met_yes","met_yes","met_yes","met_yes","met_yes","met_yes",
> > > "met_yes","met_yes","met_yes","met_yes","met_yes"(
> >
> > >
> > > timePointsVector=c("6","4","6","6","3","6","3","5","6","6","1","5"
> > > ,"3","4","3","6","6","6","2","6","1","2","4","6","5","5","5","3","
> > > 6","5","6","2","6","6","1","5","5","6","6","6","6","6","6","4","2"
> > > ,"6","3","1","2","5","6","1","1","3","6","3","6","4","4","5","6","
> > > 6","3","5","4","6","1","4","3","1","1","1","4","2","1","1","3","6"
> > > ,"1","1","2","1","6","3","3","2","5","3","2","3","1","4","1","1","
> > > 6","1")
> >
> > >
> > > colData=data.frame(row.names=colnames(countData),metas=metasVector
> > > ,gender=gendarVector)
> >
> > >  colData\$metas=factor(colData\$metas, levels=c("met_no","met_yes"))
> >
> > >  colData\$time = factor(colData\$time, levels = c("1", "2", "3",
> > > "4", "5", "6"))
> >
> > >  dds=DESeqDataSetFromMatrix(countData=tmpcountData,
> > > colData=colData, design=~time + metas + metas:time)
> >
> > >  dds=DESeq(dds)
> > I have several questions:
> >
> > -          first of all I have tried running those commands on DESeq2 version 1.2.10 (R version 3.0.2) and DESeq2 version 1.3.47 (R version 3.0.2) and what I have received from the resultsNames() function I both cases is very different. Using the 1.2.10 version I have received:
> >
> > > resultsNames(dds)
> >
> > [1] "Intercept"               "time_2_vs_1"             "time_3_vs_1"             "time_4_vs_1"             "time_5_vs_1"             "time_6_vs_1"             "metas_met_yes_vs_met_no" "time2.metasmet_yes"
> >
> >  [9] "time3.metasmet_yes"      "time4.metasmet_yes"      "time5.metasmet_yes"      "time6.metasmet_yes"
> >
> > > sessionInfo()
> >
> > R version 3.0.2 (2013-09-25)
> >
> > Platform: x86_64-w64-mingw32/x64 (64-bit)
> >
> >
> >
> > locale:
> >
> > [1] LC_COLLATE=Hebrew_Israel.1255  LC_CTYPE=Hebrew_Israel.1255    LC_MONETARY=Hebrew_Israel.1255 LC_NUMERIC=C                   LC_TIME=Hebrew_Israel.1255
> >
> >
> >
> > attached base packages:
> >
> > [1] parallel  stats     graphics  grDevices utils     datasets  methods   base
> >
> >
> >
> > other attached packages:
> >
> > [1] DESeq2_1.2.10             RcppArmadillo_0.4.100.2.1 Rcpp_0.11.0               GenomicRanges_1.14.4      XVector_0.2.0             IRanges_1.20.7            BiocGenerics_0.8.0
> >
> >
> >
> > loaded via a namespace (and not attached):
> >
> > [1] annotate_1.40.1      AnnotationDbi_1.24.0 Biobase_2.22.0       DBI_0.2-7            genefilter_1.44.0    grid_3.0.2           lattice_0.20-27      locfit_1.5-9.1       RColorBrewer_1.0-5   RSQLite_0.11.4
> >
> > [11] splines_3.0.2        stats4_3.0.2         survival_2.37-7      tools_3.0.2          XML_3.98-1.1         xtable_1.7-3
> >
> >
> >
> > Using the 1.3.47 version I have received:
> >
> > > resultsNames(dds)
> >
> > [1] "Intercept"               "timetime_1"              "timetime_2"              "timetime_3"              "timetime_4"              "timetime_5"
> >
> >  [7] "timetime_6"              "metasmet_no"             "metasmet_yes"            "timetime_1.metasmet_no"  "timetime_2.metasmet_no"  "timetime_3.metasmet_no"
> >
> > [13] "timetime_4.metasmet_no"  "timetime_5.metasmet_no"  "timetime_6.metasmet_no"  "timetime_1.metasmet_yes" "timetime_2.metasmet_yes" "timetime_3.metasmet_yes"
> >
> > [19] "timetime_4.metasmet_yes" "timetime_5.metasmet_yes" "timetime_6.metasmet_yes"
> >
> > > sessionInfo()
> >
> > R version 3.0.2 (2013-09-25)
> >
> > Platform: x86_64-w64-mingw32/x64 (64-bit)
> >
> >
> >
> > locale:
> >
> > [1] LC_COLLATE=Hebrew_Israel.1255  LC_CTYPE=Hebrew_Israel.1255    LC_MONETARY=Hebrew_Israel.1255 LC_NUMERIC=C                   LC_TIME=Hebrew_Israel.1255
> >
> >
> >
> > attached base packages:
> >
> > [1] parallel  stats     graphics  grDevices utils     datasets  methods   base
> >
> >
> >
> > other attached packages:
> >
> > [1] DESeq2_1.3.47           RcppArmadillo_0.4.100.0 Rcpp_0.11.0             GenomicRanges_1.14.4    XVector_0.2.0           IRanges_1.20.7
> >
> > [7] BiocGenerics_0.8.0
> >
> >
> >
> > loaded via a namespace (and not attached):
> >
> > [1] annotate_1.40.1      AnnotationDbi_1.24.0 Biobase_2.22.0       DBI_0.2-7            genefilter_1.44.0    geneplotter_1.40.0   grid_3.0.2
> >
> >  [8] lattice_0.20-27      locfit_1.5-9.1       RColorBrewer_1.0-5   RSQLite_0.11.4       splines_3.0.2        stats4_3.0.2         survival_2.37-7
> >
> > [15] XML_3.98-1.1         xtable_1.7-3
> >
> > (I have ran the 1.3.47 version the same way besides a difference in the names of the time levels, but I do not believe that this is the reason for the differences)
> >                 I don't fully understand the results I receive using the 1.3.47 version and even more the difference between the versions.
> >
> > -          From my understanding, the results I received using the 1.2.10 version are the more reasonable and they fit my settings of base levels in the data. Now after receiving these results I would love to understand how do I receive different contrast testing? For each time period metas_yes vs. metas_no (for example timetime_2.metasmet_yes vs. timetime_2.metasmet_no)
> >
> >
> >
> > Thank you in advance,
> >
> > Olga and Karen
> >
> >
> >
> >         [[alternative HTML version deleted]]
> >
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor at r-project.org
> > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > Search the archives:
> > http://news.gmane.org/gmane.science.biology.informatics.conductor