[BioC] no significant differences in gene expression-limma

Wed Feb 26 19:18:20 CET 2014

Thanks for your help.  It just occurred to me- should plotMDS be done on normalized data, raw data, filtered data?  All of the above?

-----Original Message-----
From: Gordon K Smyth [mailto:smyth at wehi.EDU.AU] 
Sent: Tuesday, February 25, 2014 6:50 PM
To: Sabet, Julia A
Cc: Bioconductor mailing list
Subject: RE: no significant differences in gene expression-limma

On Tue, 25 Feb 2014, Sabet, Julia A wrote:

> Thank you for your explanation, Gordon!  I was finally able to figure 
> out how to generate the plot using the following code, with labels 
> corresponding to my 3 different diet groups (the example on the help 
> page definitely helped):

>> plotMDS(mds, col=c(rep("black",12), rep("red",12), rep("blue",12)), 
>> labels= c(rep("c",12), rep("d",12), rep("s",12)))
>
> What resulted was a plot with 2 distinct clusters, but each cluster 
> has a mix of diet groups represented.  I'm pretty sure the clusters 
> are according to batch date, because when I made PCA plots, they look 
> about the same, with the same number of samples in each cluster.

There's no need to guess.  Try

   plotMDS(mds, labels=batchdate)

to confirm whether the clusters are by batch date.

Try looking at MDS dimensions 2, 3 and 4, for example by

   plotMDS(mds, dim=c(3,4))

If you don't see the diets separating on any dimension, then it is hard to argue that you really have systematic differential expression.

> I have already included batch date in my limma models to account for 
> this.  I also performed the arrayQualityMetrics function and there 
> were no outlier arrays that were found.

The issue is with the RNA samples themselves and what they represent, rather than with the quality of the microarrays.

> Is there anything else I should do to try to identify differentially 
> expressed genes?  I am pretty sure there are at least some 
> differences, since we observed differences in body weight among the mice.

You could try:

  ?arrayWeights

Best wishes
Gordon

> Thank you!
>
>
> -----Original Message-----
> From: Gordon K Smyth [mailto:smyth at wehi.EDU.AU]
> Sent: Tuesday, February 25, 2014 6:25 PM
> To: Sabet, Julia A
> Cc: Bioconductor mailing list
> Subject: no significant differences in gene expression-limma
>
> Dear Julia,
>
> I see that you have read the help page for plotMDS(), which is good, and that you have tried to copy the usage line given in the help page into your R session.
>
> This isn't at all necessary.  In practice, you just type
>
>   plotMDS(eset.dblfiltmale)
>
> perhaps adding a few arguments if you want to customize the plot.
>
> I can see that the R style of documentation can be a bit counter-intuitive, but the purpose of the usage line on the help page is to describe the defaults for all the arguments.  The reason that the defaults are set is so that you don't have to set them yourself.  To get an idea of how the function is used in practice it is best to look at the example code given at the bottom of the help page or else at the case studies in the limma User's Guide.
>
> Best wishes
> Gordon
>
>
>> Date: Mon, 24 Feb 2014 15:13:49 +0000
>> From: "Sabet, Julia A" <julia.sabet at tufts.edu>
>> To: Gordon K Smyth <smyth at wehi.EDU.AU>
>> Cc: Bioconductor mailing list <bioconductor at r-project.org>
>> Subject: Re: [BioC] no significant differences in gene
>> 	expression-limma
>> Content-Type: text/plain; charset="us-ascii"
>>
>> Thank you Gordon.  I looked up the R documentation for plotMDS and did the following code, using my normalized eset, and came up with the following error.  Could you or someone else tell me what I did wrong and if I am missing some code?  I am new to R and had never heard of plotMDS before...I don't really understand what dim.plot is.
>> Thank you!
>>
>>> mds<-plotMDS(eset.dblfiltmale, top=500, 
>>> labels=colnames(eset.dblfiltmale), col=NULL, cex=1, dim.plot=c(1,2), 
>>> ndim=max(dim.plot), gene.selection="pairwise", 
>>> xlab=paste("Dimension",dim.plot[1]),
>>> ylab=paste("Dimension",dim.plot[2]))
>> Error in plotMDS.default(eset.dblfiltmale, top = 500, labels = colnames(eset.dblfiltmale),  :
>>  object 'dim.plot' not found
>>
>>
>>
>> -----Original Message-----
>> From: Gordon K Smyth [mailto:smyth at wehi.EDU.AU]
>> Sent: Saturday, February 22, 2014 2:41 AM
>> To: Sabet, Julia A
>> Cc: Bioconductor mailing list
>> Subject: no significant differences in gene expression-limma
>>
>> Dear Julia,
>>
>> Have you made some plots of your data?  For example, plotMDS().  One should always do this, and it's not the same as an Affy Expression Console quality analysis.
>>
>> Do the treated and untreated groups look different in an MDS plot?  If they do, there you will certainly get DE.  If they don't, then the plot will show you which samples are not doing what you think they should, and then you can follow up why this might be so.
>>
>> Best wishes
>> Gordon
>>
>> PS. There's no rule that says you must use FDR < 0.05.  Larger cutoffs can be used.
>>
>>> Date: Fri, 21 Feb 2014 01:32:06 +0000
>>> From: "Sabet, Julia A" <julia.sabet at tufts.edu>
>>> To: "bioconductor at r-project.org" <bioconductor at r-project.org>
>>> Subject: [BioC] no significant differences in gene expression-limma
>>>
>>> Hi all,
>>
>>> I have just analyzed some Affymetrix Mouse Gene 2.0 ST arrays using 
>>> limma and I found no significantly differentially expressed genes 
>>> (according to adjusted p value being <0.05).  I tried a few 
>>> different filtering methods, including no filter, and I tried to 
>>> analyze males and females separately (making separate esets) as well as together.
>>> I adjusted for batch dates as well.  I have a sample size of 36 (12 
>>> per group).  I did the quality control analysis on Affymetrix 
>>> Expression Console (not in R) and everything passed according to 
>>> Affy's quick reference guide and then I proceeded to read in the CEL 
>>> files and analyze using the code below.  Is there any step that I 
>>> have left out, or anything else I can do to try to find significant 
>>> differences?  Any quality control steps in R that I should do 
>>> besides what I've done already, or should I try a different multiple 
>>> comparison method besides BH?  My advisor refuses to believe that 
>>> there could be no significant differences...
>>
>>> Thank you for your help.
>>> Julia
>>>
>>> #Load libraries
>>> library(limma)
>>> library(oligo)
>>> #Load CEL files from working directory celFiles <- list.celfiles() 
>>> affyRaw <- read.celfiles(celFiles) #Normalize/background correct
>>> library(pd.mogene.2.0.st)
>>> eset <- rma(affyRaw)
>>> #Save expression set to an output file 
>>> write.exprs(eset,file="maleLog2transformedNormalizeddata02202014.txt"
>>> )
>>> #Adding gene annotation to eset
>>> library(mogene20sttranscriptcluster.db)
>>> my_frame <- data.frame(exprs(eset))
>>> mogene20sttranscriptcluster()
>>> Annot <-
>>> data.frame(ACCNUM=sapply(contents(mogene20sttranscriptclusterACCNUM)
>>> ,
>>> paste, collapse=", "),
>>> SYMBOL=sapply(contents(mogene20sttranscriptclusterSYMBOL), paste, 
>>> collapse=", "), 
>>> DESC=sapply(contents(mogene20sttranscriptclusterGENENAME), paste, 
>>> collapse=", ")) all <- merge(Annot, my_frame, by.x=0, by.y=0, all=T)
>>> write.table(all,file="maleannotateddata02202014.txt",sep="\t")
>>> #Filter background (very low expression) probes
>>> library(genefilter)
>>> ind <- genefilter(eset, filterfun(kOverA(12, 6))) eset.filt <- 
>>> eset[ind,]
>>> dim(eset.filt)
>>> #Or filter by using antigenomic probesets as a measure of 
>>> background(filtered more genes out)
>>> library(pd.mogene.2.0.st)
>>> con <- db(pd.mogene.2.0.st)
>>> dbGetQuery(con, "select * from type_dict;") #type of probes in my
>>> array:
>>> table(dbGetQuery(con, "select type from featureSet;")[,1]) antigm <- 
>>> dbGetQuery(con, "select meta_fsetid from core_mps inner join 
>>> featureSet on core_mps.fsetid=featureSet.fsetid where
>>> featureSet.type='4';")
>>> bkg <- apply(exprs(eset)[as.character(antigm[,1]),], 2, quantile,
>>> probs=0.95)
>>> minval <- max(bkg)
>>> ind <- genefilter(eset, filterfun(kOverA(12, minval))) 
>>> eset.filtB<-eset[ind,]
>>> dim(eset.filtB)
>>> #Filter out control probes and background probes
>>> library(affycoretools)
>>> eset.filtC <- getMainProbes(eset)
>>> library(genefilter)
>>> ind <- genefilter(eset.filtC, filterfun(kOverA(12, 6))) 
>>> eset.dblfiltmale <- eset.filtC[ind,]
>>> dim(eset)
>>> dim(eset.filtC)
>>> dim(eset.dblfilt)
>>> #Read targets file into R
>>> targets <- readTargets("Targets.txt", row.names="Name") #Make design 
>>> matrix with only diet as covariate f <- factor(targets$PaternalDiet,
>>> levels=c("c","d","s")) design <- model.matrix(~0+f)
>>> colnames(design)<-c("c","d","s")
>>> fit <- lmFit(eset.filtB, design)
>>> #Make contrast matrix and models
>>> contrast.matrix <- makeContrasts(d-c, s-d, s-c, levels=design)
>>> fit2 <- contrasts.fit(fit, contrast.matrix)
>>> fit2 <- eBayes(fit2)
>>> #Show most significant differentially expressed genes- do for each 
>>> coefficient separately topTable(fit2, coef=1, adjust="BH") #Show top 
>>> F stats(sig. differences in any contrast) topTableF(fit2, number=30) 
>>> # Make design matrix for 2 factors (sex and diet), adjusted for
>>> batch(date) DS <- paste(targets$PaternalDiet, targets$Sex, sep=".") 
>>> DS<-factor(DS,
>>> levels=c("c.female","d.female","s.female","c.male","d.male","s.male"
>>> )
>>> ) design <- model.matrix(~0+DS+Date, targets)
>>> colnames(design) <- gsub("targets", "", colnames(design)) 
>>> colnames(design)[7:9] <- paste0("Date", 1:3) fit <- 
>>> lmFit(eset.dblfilt, design) cont.matrix <- makeContrasts( 
>>> DvsCinMale=DSd.male-DSc.male, SvsCinMale=DSs.male-DSc.male, 
>>> SvsDinMale=DSs.male-DSd.male, DvsCinFemale=DSd.female-DSc.female,
>>> SvsCinFemale=DSs.female-DSc.female,
>>> SvsDinFemale=DSs.female-DSd.female,
>>> levels=design)
>>> fit2 <- contrasts.fit(fit, cont.matrix)
>>> fit2 <- eBayes(fit2)
>>> topTable(fit2, coef=1, adjust="BH")
>>> topTableF(fit2, number=30)
>
> ______________________________________________________________________
> The information in this email is confidential and inte...{{dropped:14}}