[BioC] DESeq2, problems using list argument to extract results from DESeq

Michael Love michaelisaiahlove at gmail.com
Thu Jul 31 15:50:59 CEST 2014


hi Jon,

On Thu, Jul 31, 2014 at 2:13 AM, Jon Bråte <jon.brate at ibv.uio.no> wrote:
> Hi,
>
>> dds$condition
>  [1] Early vitellogenesis Early vitellogenesis Early vitellogenesis Late
> vitellogenesis
>  [5] Late vitellogenesis  Late vitellogenesis  Late vitellogenesis
> Fertilization
>  [9] Fertilization        Early cleavage       Early cleavage       Late
> cleavage
> [13] Late cleavage        Early preinversion   Early preinversion   Late
> preinversion
> [17] Late preinversion    Early postinversion  Early postinversion  Late
> postinversion
> [21] Non reproductive     Non reproductive     Non reproductive     Non
> reproductive
> [25] Non reproductive     Non reproductive
> 10 Levels: Non reproductive Early vitellogenesis Late vitellogenesis
> Fertilization ... Late postinversion
>
>> dds
> class: DESeqDataSet
> dim: 6137 26
> exptData(0):
> assays(3): counts mu cooks
> rownames(6137): scign000005 scign000006 ... scign021663 scign021669
> rowData metadata column names(57): baseMean baseVar ... deviance maxCooks
> colnames(26): 05.apr 11.apr ... R2D0T1 R3D0T1
> colData names(2): condition sizeFactor
>
>
> On 31. juli 2014, at 02:28, Michael Love wrote:
>
> Hi Jon,
>
> On Jul 30, 2014 3:27 PM, "Jon Bråte"
>> thank you for your answer. What I am interested in is to find genes that
>> possibly play a part in embryogenesis. In other words, genes that are
>> up-regulated in any of the embryogenesis stages compared to the
>> non-reproductive stages.
>
> Can you tell us the levels of condition which are embryogenesis vs those
> which are non-reproductive?
>
>>
>> I have tried three different approaches, but struggle to understand what
>> they mean, how they differ, and if they actually make sense. I try to
>> explain what I did, sorry if it is difficult to read:
>>
>>
>> First I tried to contrast all the embryogenesis stages compared to the non
>> reproductive using the Wald test in DESeq:
>>
>> dds = DESeq(dds)
>> resCtrst = results(dds, contrast=list(c("conditionEarly.vitellogenesis",
>> "conditionLate.vitellogenesis", "conditionFertilization",
>> "conditionEarly.cleavage",
>>                                         "conditionLate.cleavage",
>> "conditionEarly.preinversion", "conditionLate.preinversion",
>>                                         "conditionEarly.postinversion",
>> "conditionLate.postinversion"), "conditionNon.reproductive"))

Here you need to include the results() argument listValues=c(1/9, -1).
This is to compare the average of the 9 embryogenic levels to the 1
non-reproductive level. This seems like the correct results table for
your question.  Additionally you can perform individual contrasts of
one level with the non-reproductive level:

res = results(dds,
contrast=c("condition","Early.vitellogenesis","Non.reproductive"))


>> resCtrst = resCtrst[order(resCtrst$padj),]
>> resSigCtrst = resCtrst[which(resCtrst$padj < 0.1), ]
>> resSigCtrstPositive = resSigCtrst[which(resSigCtrst$log2FoldChange > 0), ]
>> write.csv2(resSigCtrstPositive,
>> file="DESeq2_Wald_test_Sig_upreg_all_cond_vs_non_repr.csv")
>>
>> This gave me 1736 upregulated genes.
>>
>>
>> Second I ran an LTR test like this:
>>
>> ddsLRT = DESeq(dds, test="LRT", full=~condition, reduced=~1)
>> resLRT = results(ddsLRT)
>> resLRT = resLRT[order(resLRT$padj),]
>> resSigLRT = resLRT[which(resLRT$padj < 0.1), ]
>> resSigLRTPositive = resSigLRT[which(resSigLRT$log2FoldChange > 0), ]
>> write.csv2(resSigLRT, file="DESeq2_Sig_LRT_upreg.csv")
>>
>> This gave me 3158 upregulated genes (all genes have positive log2fold
>> changes so I guess I cannot see which are up or down regulated?)
>>

The log2 fold change which is shown is for just one of the 9
coefficients in the model (the last level over the base level).

In the ?results help file we have:

     For analyses using the likelihood ratio test (using ‘nbinomLRT’),
     the p-values are determined solely by the difference in deviance
     between the full and reduced model formula. A log2 fold change is
     included, which can be controlled using the ‘name’ argument, or by
     default this will be the estimated coefficient for the last
     element of ‘resultsNames(object)’.


>> And I don't understand exactly what is contained in resLRT, are the
>> significant genes here significantly expressed between Non reproductive
>> samples and all others compared to the null hypothesis?

The LRT in this case is testing for any changes across *any* levels of
condition. So this sounds like not the test which you are interested
in answering.


>> Or is it just for
>> the two conditions Late.postinversion vs Non reproductive?:
>>
>> mcols(resLRT)
>> DataFrame with 6 rows and 2 columns
>>           type
>> description
>>    <character>
>> <character>
>> 1 intermediate                                        the base mean over
>> all
>> rows
>> 2      results log2 fold change: condition Late.postinversion vs
>> Non.reproductive
>> 3      results   standard error: condition Late.postinversion vs
>> Non.reproductive
>> 4      results                              LRT statistic: '~ condition'
>> vs
>> '~ 1'
>> 5      results                                LRT p-value: '~ condition'
>> vs
>> '~ 1'
>> 6      results                                               BH adjusted
>> p-values
>>
>>
>> Third I tried to compare the Wald test and the LTR:
>>
>> res = results(dds) #should I use resCtrst instead of res??

from the ?results help file:

     If ‘results’ is run without specifying
     ‘contrast’ or ‘name’, it will return the comparison of the last
     level of the last variable in the design formula over the first
     level of this variable.


Mike

>> tab = table(Wald=res$padj < .1, LRT=resLRT$padj < .1)
>> addmargins(tab)
>>
>>        LRT
>> Wald    FALSE TRUE  Sum
>>   FALSE  1027 1766 2793
>>   TRUE    526  747 1273
>>   Sum    1553 2513 4066
>>
>>
>> sessionInfo()
>> R version 3.1.0 (2014-04-10)
>> Platform: x86_64-apple-darwin10.8.0 (64-bit)
>>
>> locale:
>> [1] C
>>
>> attached base packages:
>> [1] parallel  stats     graphics  grDevices utils     datasets  methods
>> base
>>
>> other attached packages:
>> [1] DESeq2_1.4.5            RcppArmadillo_0.4.320.0 Rcpp_0.11.2
>> GenomicRanges_1.16.3
>> [5] GenomeInfoDb_1.0.2      IRanges_1.22.9          BiocGenerics_0.10.0
>>
>> loaded via a namespace (and not attached):
>>  [1] AnnotationDbi_1.26.0 Biobase_2.24.0       DBI_0.2-7
>> RColorBrewer_1.0-5   RSQLite_0.11.4
>>  [6] XML_3.98-1.1         XVector_0.4.0        annotate_1.42.0
>> genefilter_1.46.1    geneplotter_1.42.0
>> [11] grid_3.1.0           lattice_0.20-29      locfit_1.5-9.1
>> splines_3.1.0        stats4_3.1.0
>> [16] survival_2.37-7      tools_3.1.0          xtable_1.7-3
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
>
>
> ----------------------------------------------------------------
> Jon Bråte
>
> Section for Genetics and Evolutionary Biology (EVOGENE)
> Department of Biosciences
> University of Oslo
> P.B. 1066 Blindern
> N-0316, Norway
> Email: jon.brate at ibv.uio.no
> Phone: 922 44 582
> Web: mn.uio.no/ibv/english/people/aca/jonbra/index.html
>
>
>
>



More information about the Bioconductor mailing list