[BioC] multi-level, multi-factor results interpretation.

Wolfgang Huber whuber at embl.de
Sat Apr 21 09:06:26 CEST 2012


Dear Ingrid

I wonder whether you really sent the code that you ran, because 
'AlterativeVariable' is sometimes spellt with 'n' and sometimes without, 
and R wouldn't normally tolerate that.

The question you are asking is not specific for NGS data or for DESeq, 
but rather concerns multivariate and multilevel ANOVA in general. I see 
two main options, as you suggest:

1. test for more specific hypotheses, such as ctrl vs x and ctrl vs y 
separately.

2. do a scatterplot with your significant hits (ctrl-vs-x on one axis, 
and ctrl-vs-y on the other; possibly old vs new on the third) to get an 
overview of the hits and cluster them into subgroups.

	Best wishes
	Wolfgang




Apr/20/12 11:37 PM, Ingrid Lindquist scripsit::
> Hi Simon,
>
> I am working with a multi-factor experiment that, in a sense, doesn't
> have biological replicates (replicates were done at different times, and
> the differences in the chemistry/instrumentation is enough to be a
> rather large component of variance) and has three levels. By three
> levels, I mean there are 2 different treatments and 1 control. When I
> push the workflow through, everything works fine, but I'm having a hard
> time delineating which level is responsible for a gene being identified
> as significantly differentially expressed. I understand that (in the
> following example) ConditionX, ConditionY, AlternativeVariableold are
> log2 fold changes in relation to "ctrl" (in the case of condition) or
> "new" (in the case of AlternativeVariable). Can I assume that the
> largest FC of these three fields within each gene instance is likely the
> reason that gene had a padj within significance? My plan is to separate
> out the dataset so that I'm only running 2 levels at once (x vs ctrl; y
> vs ctrl), but I'm hoping you can shed light on how I can interpret these
> results I've mentioned here.
>
> The output I'm referring to has the following headers (generated by the
> last line (write.table) in my workflow below):
>
> Gene Pval Padj Intercept ConditionX ConditionY AlternativeVariableold
> Deviance Converged
>
>
> My workflow:
>
> counts <- (file, header=TRUE, row.names=1)
>
> Design<- data.frame(
> + row.names = colnames(SampledCounts),
> + Condition = c( "x", "ctrl", "ctrl", "x", "y", "y"),
> + AlterativeVariable= c( "old", "new", "old", "new", "new", "old"))
>
> library(DESeq)
>
> cdsFull <- newCountDataSet (counts, Design)
> cdsFull <-estimateSizeFactors(cdsFull)
> cdsFull <-estimateDispersions(cdsFull, method="blind",
> sharingMode="fit-only")
> fit1<-fitNbinomGLMs(cdsFull, count ~ condition + AlternativeVariable)
> fit0<-fitNbinomGLMs(cdsFull, count ~ AlternativeVariable)
> pvalsGLM <-nbinomGLMTest(fit1, fit0)
> padjGLM <-p.adjust(pvalsGLM, method = "BH")
> write.table(data.frame(geneID=row.names(counts(cdsFull)), pval=pvalsGLM,
> padj=padjGLM, fit1), file= "result.txt")
>
> Thanks for your help,
> Ingrid
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor


-- 
Best wishes
	Wolfgang

Wolfgang Huber
EMBL
http://www.embl.de/research/units/genome_biology/huber



More information about the Bioconductor mailing list