[BioC] Comparing Two Affymetrix Arrays Question

Wed Jun 29 15:37:02 CEST 2011

Hi Ramzi,

As far as I can tell, your code looks correct. So yes, group A and 
group B are too similar to show significant differences after 
correction for multiple testing. That's not the same as saying there 
are no genes that are differentially expressed between the two, just 
that you can't pick them out of the expected false positives. You 
could also see this visually by doing some sort of hierarchical 
clustering and/or PCA clustering on your samples - Wild type sample 
would probably cluster together, but the Group A and Group B would be 
intermingled. You can still rank your genes in the A vs. B comparison 
and looks among the top genes to look for interesting candidates - 
you could do qPCR on additional samples for these potential 
candidates to confirm them. You could also do something like GSEA 
that uses the entire ranked list of genes to see if any pathways or 
GO terms are over-represented at the top of the list; if there were 
truly no significant differences, then the placement of genes should 
be random in your lists.

HTH,
Jenny

At 08:44 AM 6/24/2011, Abboud, Ramzi wrote:
>Jenny et al.,
>
>Thanks again for the help so far.
>
>I did some reading about FDR and it makes perfect sense.  If you are 
>considering 45,000 genes it is much more likely you will see a 
>difference between the two data sets, and this needs to be corrected for.
>
>Following your advice, I reverted my code back to fitting one model 
>on all 3 groups in order to get more power.  I then pull out the 
>pairwise comparisons that I want and write them out to files.  The 
>comparisons again are Group A vs Group B, Group A vs Wild Type, and 
>Group B vs Wild Type.  Below is my code, does it look correct? There 
>are three main things I'd like to confirm are correct.  1 - The 
>statistics calls.  2 - The contrasts matrix.  3 - Pulling out the 
>pairwise comparisons at the end.
>
>The adjusted p-value is still the same (.999_) in this 3 group 
>model.  So is the final conclusion that Group A and Group B are too 
>similar to show significant difference with this sample size / study 
>power?  And if I consider the unadjusted p-value, is the main 
>concern a high false positive rate?
>
>Thank you,
>Ramzi
>
>My Code:
>
>## Load Packages
>library(affy)       # Affymetrix pre-processing
>library(limma)      # two-color pre-processing; differential expression
>library(Biobase)
>
>## Read targets file.
>pd <- 
>read.AnnotatedDataFrame("Targets.txt",header=TRUE,row.names=1,as.is=TRUE)
>
>## Read .CEL data.
>rawAffyData <- ReadAffy(filenames=pData(pd)$FileName,phenoData=pd)
>
>## Normalize the data
>eset <- rma(rawAffyData)
>
>## The target file information can be recovered from the eset.
>targets <- pData(eset)
>
>## Define a design matrix.
>designMatrix <- model.matrix(~ -1 + 
>factor(targets$Target,levels=unique(targets$Target)))
>colnames(designMatrix) <- unique(targets$Target)
>numParameters <- ncol(designMatrix)
>parameterNames <- colnames(designMatrix)
>
>## Define a contrasts matrix.
>## Three comparisons:
>## 1. Col vs Exp
>## 2. Col vs WT
>## 3. Exp vs WT
>contrastNames <- c(paste(parameterNames[1],parameterNames[2],sep="_vs_"),
>                     paste(parameterNames[1],parameterNames[3],sep="_vs_"),
>                     paste(parameterNames[2],parameterNames[3],sep="_vs_"))
>contrastsMatrix <- matrix(c(1,-1,0,1,0,-1,0,1,-1),nrow=ncol(designMatrix))
>rownames(contrastsMatrix) <- parameterNames
>colnames(contrastsMatrix) <- contrastNames
>
># This is my contrasts matrix.  Collapsed represents Group A, 
>Expanding represents Group B, and WildType is. . . Wild Type.
>contrastsMatrix
>           Collapsed_vs_Expanding Collapsed_vs_WildType Expanding_vs_WildType
>Collapsed                      1                     1                     0
>Expanding                     -1                     0                     1
>WildType                       0                    -1                    -1
>
>
>## Fit to linear model.
>fit <- lmFit(eset,design=designMatrix)
>
>## Fit a linear model for the contrasts.
>fit2  <- contrasts.fit(fit,contrasts=contrastsMatrix)
>
>## Empirical Bayes statistics
>fit2  <- eBayes(fit2)
>
>numGenes <- nrow(eset)
>completeTable_Col_vs_Exp <- 
>topTable(fit2,coef="Collapsed_vs_Expanding",number=numGenes)
>write.table(completeTable_Col_vs_Exp,file="Col_vs_Exp_genes.xls",sep="\t",quote=FALSE,col.names=NA)
>completeTable_Col_vs_WT <- 
>topTable(fit2,coef="Collapsed_vs_WildType",number=numGenes)
>write.table(completeTable_Col_vs_WT,file="Col_vs_WT_genes.xls",sep="\t",quote=FALSE,col.names=NA)
>completeTable_Exp_vs_WT <- 
>topTable(fit2,coef="Expanding_vs_WildType",number=numGenes)
>write.table(completeTable_Exp_vs_WT,file="Exp_vs_WT_genes.xls",sep="\t",quote=FALSE,col.names=NA)
>
>
>________________________________________
>From: Jenny Drnevich [drnevich at illinois.edu]
>Sent: Thursday, June 23, 2011 4:38 PM
>To: Abboud, Ramzi; bioconductor at r-project.org
>Subject: Re: [BioC] Comparing Two Affymetrix Arrays Question
>
>Hi Ramzi,
>
>There's nothing "wrong" with your results - there is simply not
>enough statistical evidence for differential expression between Group
>A and Group B. Is there a reason you are not following the
>limmaUserGuide() section 8.6, which goes through how to fit one model
>on 3 groups and pull out the pairwise comparisons you want? You'll
>get more power to detect differences between Groups A and B by
>fitting one model with all 3 groups instead of fitting separate models.
>
>Best,
>Jenny
>
>At 03:30 PM 6/23/2011, Abboud, Ramzi wrote:
> >Hello All,
> >
> >I am somewhat new to bioconductor, and am trying to accomplish what
> >I believe is a simple task.  I have 15 AffyMetrix gene signatures
> >from 15 subjects, each in the form of a .cel file.  The subjects are
> >grouped into 3 groups of 5 - let's say Group A, Group B, and Wild
> >Type.  I would like to compare the average gene expression of
> >subjects in Group A to those in Group B, and separately compare the
> >average gene expression in Group A to Wild Type.  In the results I
> >would like the genes most significantly different in expression
> >levels and the p-values for each gene comparison.
> >
> >I have code which I believe does this, but the results do not seem
> >totally correct, and I would like some help.
> >
> >I have two very similar R scripts (to keep things separate and
> >simple).  One compares Group A to Group B.  The other compares Group
> >A to Wild Type.  The results from comparing Group A to Wild Type
> >look correct.  However, the results from comparing Group A to Group
> >B give an adjusted p value of 0.999992827573282 for every single
> >gene.  Here are the top two lines from the output file (columns are
> >ID, logFC, AveExpr, t, P.Value, adj.P.Val, B) :
> >
> >42970   1458675_at      -0.402322454    4.770279273     -4.522575441
> >    0.000900789     0.999992828     -3.268563539
> >23121   1438815_at      0.437319401     7.866319701     4.013307606
> >    0.002098968     0.999992828     -3.417357338
> >
> >Obviously something is not right.  All the other numbers from the
> >Group A vs Group B comparison look reasonable, but this adjusted p
> >value is making me doubt the whole thing.
> >
> >Does someone see a glaring and obvious mistake in my code (which is
> >included below)?  Is there a better or simpler way to do comparison?
> >
> >Please let me know if I can provide any additional information.  I
> >would be happy to provide the excel
> >
> >Thank you,
> >Ramzi
> >
> >The following code compares Group A with Group B.  It is my R
> >Script, with notes.
> >
> >## Load Packages
> >library(affy)       # Affymetrix pre-processing
> >library(limma)      # two-color pre-processing; differential expression
> >library(Biobase)
> >
> >## Read targets file.
> >pd <-
> >read.AnnotatedDataFrame("TargetsAvsB.txt",header=TRUE,row.names=1,a 
> s.is=TRUE)
> >
> >## Read .CEL data.
> >rawAffyData <- ReadAffy(filenames=pData(pd)$FileName,phenoData=pd)
> >
> >## Normalize the data
> >eset <- rma(rawAffyData)
> >
> >## The target file information can be recovered from the eset.
> >targets <- pData(eset)
> >
> >## Define a design matrix.
> >#designMatrix <- createDesignMatrix(eset)
> >designMatrix <- model.matrix(~ -1 +
> >factor(targets$Target,levels=unique(targets$Target)))
> >colnames(designMatrix) <- unique(targets$Target)
> >numParameters <- ncol(designMatrix)
> >parameterNames <- colnames(designMatrix)
> >
> >## Define a contrasts matrix.
> >#contrastsMatrix <- createContrastMatrix(eset, design=designMatrix
> >contrastNames <- c(paste(parameterNames[1],parameterNames[2],sep="_vs_"))
> >contrastsMatrix <- matrix(c(1,-1),nrow=ncol(designMatrix))
> >rownames(contrastsMatrix) <- parameterNames
> >colnames(contrastsMatrix) <- contrastNames
> >
> >## Fit to linear model.
> >fit <- lmFit(eset,design=designMatrix)
> >
> >## Empirical Bayes statistics
> >fitNoContrastsMatrix <- eBayes(fit)
> >
> >## Fit a linear model for the contrasts.
> >fit2  <- contrasts.fit(fit,contrasts=contrastsMatrix)
> >
> >## Empirical Bayes statistics
> >fit2  <- eBayes(fit2)
> >
> >numGenes <- nrow(eset)
> >completeTable_A_vs_B <- topTable(fit2,number=numGenes)
> >write.table(completeTable_A_vs_B
> >,file="A_vs_B_genes.xls",sep="\t",quote=FALSE,col.names=NA)
> >
> >_______________________________________________
> >Bioconductor mailing list
> >Bioconductor at r-project.org
> >https://stat.ethz.ch/mailman/listinfo/bioconductor
> >Search the archives:
> >http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>_______________________________________________
>Bioconductor mailing list
>Bioconductor at r-project.org
>https://stat.ethz.ch/mailman/listinfo/bioconductor
>Search the archives: 
>http://news.gmane.org/gmane.science.biology.informatics.conductor