[BioC] DESeq2 extracting results in version >= 1.3 (was: "using DESeq2 with multi factor data")

Michael Love michaelisaiahlove at gmail.com
Mon Mar 17 14:33:30 CET 2014


hi Olga and Karen,

Thanks for your interest in DESeq2 and for using the development
branch, which allows us to clarify changes before they enter the
release branch.

This concerns a modeling change from v1.2 to v1.3. Changes like these
are documented in the NEWS file with pointers to the
functions/arguments that changed:
(e.g. http://www.bioconductor.org/packages/2.14/bioc/news/DESeq2/NEWS)
I've been working on including more documentation in the package, but
haven't gotten to adding the parts about interaction terms yet, so
this gives me a chance to explain a bit.

The new resultsNames(dds) are due to "expanded model matrices", which
include a column in the design matrix for each level of a factor, as
opposed to "standard model matrices" produced by model.matrix, which
have a column in the deisgn matrix for the levels other than the base
level. Expanded model matrices are described a bit in the man pages
and in the vignette, but not yet using an example with interaction
effects. This makes the DE analysis independent of the choice to base
level, whereas previously the log fold change (LFC) shrinkage was not
independent of base level choices. In addition, it simplifies certain
comparisons.

You now have the columns "metasmet_no" and "metasmet_yes", whereas
before you had "metas_met_yes_vs_met_no".  Previously you would use
the 'name' argument to extract this comparison, whereas with version
>= 1.3 we want to encourage users to use the 'contrast' argument:

results(dds, contrast=c("metas","met_yes","met_no"))

If the analysis includes interaction terms, previously the above
contrast would give the metastasis effect only for the base level of
the other factors in the design, but for version >= 1.3, this gives
the overall metastasis effect over all levels of the time variable.

If you want the individual time-period-specific effects, this is the
overall effect plus the interaction effects for that time period.You
have the following columns of the model matrix:

> [1] "Intercept"               "timetime_1"              "timetime_2"              "timetime_3"              "timetime_4"              "timetime_5"
>  [7] "timetime_6"              "metasmet_no"             "metasmet_yes"            "timetime_1.metasmet_no"  "timetime_2.metasmet_no"  "timetime_3.metasmet_no"
> [13] "timetime_4.metasmet_no"  "timetime_5.metasmet_no"  "timetime_6.metasmet_no"  "timetime_1.metasmet_yes" "timetime_2.metasmet_yes" "timetime_3.metasmet_yes"
> [19] "timetime_4.metasmet_yes" "timetime_5.metasmet_yes" "timetime_6.metasmet_yes"

So the metastasis effect specific to time period 2 would be, the
contrast of "metasmet_yes" vs "metasmet_no" and additionally, the
contrast of "timetime_2.metasmet_yes" vs "timetime_2.metasmet_no". We
can pull these out using the numeric contrast vector, starting with
all 0's and then filing in the -1's and 1's:

ctrst <- numeric(21)
rn <- resultsNames(dds)
ctrst[ rn == "metasmet_no" ] <- -1
ctrst[ rn == "metasmet_yes" ] <- 1
ctrst[ rn == "timetime_2.metasmet_no" ] <- -1
ctrst[ rn == "timetime_2.metasmet_yes" ] <- 1
results(dds, contrast=ctrst)


Mike



On Mon, Mar 17, 2014 at 5:07 AM, solgakar at bi.technion.ac.il
<solgakar at bi.technion.ac.il> wrote:
>
>
>
> From: Karen Chait [mailto:kchait at tx.technion.ac.il]
> Sent: Monday, March 17, 2014 10:57 AM
> To: Olga Karinsky
> Subject: RE: using DESeq2 with multi factor data
>
> Hello all,
> I am trying to use the DESeq2 package to perform RNA-Seq analysis on a data containing several factors.
> I have been closely following the emails between Ming Yi and Michael Love, because I think that my problem is very similar to what they have discussed.  But even though I received a lot of useful information from their discussion, I still have several questions regarding my specific data.
> Just as an overall information regarding my data, I have 96 samples and the two factors I am interested in exploring are "time" and "metastasis".
> In order to build my data set I used the following commands:
>
> >  countData = read.table("merged_counts.txt", header=TRUE, row.names=1)
>
> >  metasVector=c("met_no","met_no","met_no","met_no","met_no","met_no","met_no","met_no","met_no","met_no","met_no","met_no","met_no","met_no","met_no","met_no","met_no","met_no","met_no","met_no","met_no","met_no","met_no","met_no","met_no","met_no","met_no","met_no","met_no","met_no","met_no","met_no","met_no","met_no","met_no","met_no","met_no","met_no","met_no","met_no","met_no","met_no","met_no","met_no","met_no","met_no","met_no","met_no","met_no","met_no","met_no","met_no","met_no","met_no","met_no","met_no","met_no","met_no","met_no","met_no","met_no","met_no","met_no","met_no","met_no","met_no","met_yes","met_yes","met_yes","met_yes","met_yes","met_yes","met_yes","met_yes","met_yes","met_yes","met_yes","met_yes","met_yes","met_yes","met_yes","met_yes","met_yes","met_yes","met_yes","met_yes","met_yes","met_yes","met_yes","met_yes","met_yes","met_yes","met_yes","met_yes","met_yes","met_yes"(
>
> >  timePointsVector=c("6","4","6","6","3","6","3","5","6","6","1","5","3","4","3","6","6","6","2","6","1","2","4","6","5","5","5","3","6","5","6","2","6","6","1","5","5","6","6","6","6","6","6","4","2","6","3","1","2","5","6","1","1","3","6","3","6","4","4","5","6","6","3","5","4","6","1","4","3","1","1","1","4","2","1","1","3","6","1","1","2","1","6","3","3","2","5","3","2","3","1","4","1","1","6","1")
>
> >  colData=data.frame(row.names=colnames(countData),metas=metasVector,gender=gendarVector)
>
> >  colData$metas=factor(colData$metas, levels=c("met_no","met_yes"))
>
> >  colData$time = factor(colData$time, levels = c("1", "2", "3", "4", "5", "6"))
>
> >  dds=DESeqDataSetFromMatrix(countData=tmpcountData, colData=colData, design=~time + metas + metas:time)
>
> >  dds=DESeq(dds)
> I have several questions:
>
> -          first of all I have tried running those commands on DESeq2 version 1.2.10 (R version 3.0.2) and DESeq2 version 1.3.47 (R version 3.0.2) and what I have received from the resultsNames() function I both cases is very different. Using the 1.2.10 version I have received:
>
> > resultsNames(dds)
>
> [1] "Intercept"               "time_2_vs_1"             "time_3_vs_1"             "time_4_vs_1"             "time_5_vs_1"             "time_6_vs_1"             "metas_met_yes_vs_met_no" "time2.metasmet_yes"
>
>  [9] "time3.metasmet_yes"      "time4.metasmet_yes"      "time5.metasmet_yes"      "time6.metasmet_yes"
>
> > sessionInfo()
>
> R version 3.0.2 (2013-09-25)
>
> Platform: x86_64-w64-mingw32/x64 (64-bit)
>
>
>
> locale:
>
> [1] LC_COLLATE=Hebrew_Israel.1255  LC_CTYPE=Hebrew_Israel.1255    LC_MONETARY=Hebrew_Israel.1255 LC_NUMERIC=C                   LC_TIME=Hebrew_Israel.1255
>
>
>
> attached base packages:
>
> [1] parallel  stats     graphics  grDevices utils     datasets  methods   base
>
>
>
> other attached packages:
>
> [1] DESeq2_1.2.10             RcppArmadillo_0.4.100.2.1 Rcpp_0.11.0               GenomicRanges_1.14.4      XVector_0.2.0             IRanges_1.20.7            BiocGenerics_0.8.0
>
>
>
> loaded via a namespace (and not attached):
>
> [1] annotate_1.40.1      AnnotationDbi_1.24.0 Biobase_2.22.0       DBI_0.2-7            genefilter_1.44.0    grid_3.0.2           lattice_0.20-27      locfit_1.5-9.1       RColorBrewer_1.0-5   RSQLite_0.11.4
>
> [11] splines_3.0.2        stats4_3.0.2         survival_2.37-7      tools_3.0.2          XML_3.98-1.1         xtable_1.7-3
>
>
>
> Using the 1.3.47 version I have received:
>
> > resultsNames(dds)
>
> [1] "Intercept"               "timetime_1"              "timetime_2"              "timetime_3"              "timetime_4"              "timetime_5"
>
>  [7] "timetime_6"              "metasmet_no"             "metasmet_yes"            "timetime_1.metasmet_no"  "timetime_2.metasmet_no"  "timetime_3.metasmet_no"
>
> [13] "timetime_4.metasmet_no"  "timetime_5.metasmet_no"  "timetime_6.metasmet_no"  "timetime_1.metasmet_yes" "timetime_2.metasmet_yes" "timetime_3.metasmet_yes"
>
> [19] "timetime_4.metasmet_yes" "timetime_5.metasmet_yes" "timetime_6.metasmet_yes"
>
> > sessionInfo()
>
> R version 3.0.2 (2013-09-25)
>
> Platform: x86_64-w64-mingw32/x64 (64-bit)
>
>
>
> locale:
>
> [1] LC_COLLATE=Hebrew_Israel.1255  LC_CTYPE=Hebrew_Israel.1255    LC_MONETARY=Hebrew_Israel.1255 LC_NUMERIC=C                   LC_TIME=Hebrew_Israel.1255
>
>
>
> attached base packages:
>
> [1] parallel  stats     graphics  grDevices utils     datasets  methods   base
>
>
>
> other attached packages:
>
> [1] DESeq2_1.3.47           RcppArmadillo_0.4.100.0 Rcpp_0.11.0             GenomicRanges_1.14.4    XVector_0.2.0           IRanges_1.20.7
>
> [7] BiocGenerics_0.8.0
>
>
>
> loaded via a namespace (and not attached):
>
> [1] annotate_1.40.1      AnnotationDbi_1.24.0 Biobase_2.22.0       DBI_0.2-7            genefilter_1.44.0    geneplotter_1.40.0   grid_3.0.2
>
>  [8] lattice_0.20-27      locfit_1.5-9.1       RColorBrewer_1.0-5   RSQLite_0.11.4       splines_3.0.2        stats4_3.0.2         survival_2.37-7
>
> [15] XML_3.98-1.1         xtable_1.7-3
>
> (I have ran the 1.3.47 version the same way besides a difference in the names of the time levels, but I do not believe that this is the reason for the differences)
>                 I don't fully understand the results I receive using the 1.3.47 version and even more the difference between the versions.
>
> -          From my understanding, the results I received using the 1.2.10 version are the more reasonable and they fit my settings of base levels in the data. Now after receiving these results I would love to understand how do I receive different contrast testing? For each time period metas_yes vs. metas_no (for example timetime_2.metasmet_yes vs. timetime_2.metasmet_no)
>
>
>
> Thank you in advance,
>
> Olga and Karen
>
>
>
>         [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list