[BioC] Comparing Two Affymetrix Arrays Question

Thu Jun 30 14:39:46 CEST 2011

Jenny,

Thank you for the reply.

I just wanted to post to round things out.  I have loaded my data into GSEA and am seeing what I can learn in that environment.  It is relatively simple to use, the difficulty is in interpreting the results.

Thanks to all for the help.

Ramzi

________________________________________
From: Jenny Drnevich [drnevich at illinois.edu]
Sent: Wednesday, June 29, 2011 9:37 AM
To: Abboud, Ramzi; bioconductor at r-project.org
Subject: Re: [BioC] Comparing Two Affymetrix Arrays Question

Hi Ramzi,

As far as I can tell, your code looks correct. So yes, group A and
group B are too similar to show significant differences after
correction for multiple testing. That's not the same as saying there
are no genes that are differentially expressed between the two, just
that you can't pick them out of the expected false positives. You
could also see this visually by doing some sort of hierarchical
clustering and/or PCA clustering on your samples - Wild type sample
would probably cluster together, but the Group A and Group B would be
intermingled. You can still rank your genes in the A vs. B comparison
and looks among the top genes to look for interesting candidates -
you could do qPCR on additional samples for these potential
candidates to confirm them. You could also do something like GSEA
that uses the entire ranked list of genes to see if any pathways or
GO terms are over-represented at the top of the list; if there were
truly no significant differences, then the placement of genes should
be random in your lists.

HTH,
Jenny

At 08:44 AM 6/24/2011, Abboud, Ramzi wrote:
>Jenny et al.,
>
>Thanks again for the help so far.
>
>I did some reading about FDR and it makes perfect sense.  If you are
>considering 45,000 genes it is much more likely you will see a
>difference between the two data sets, and this needs to be corrected for.
>
>Following your advice, I reverted my code back to fitting one model
>on all 3 groups in order to get more power.  I then pull out the
>pairwise comparisons that I want and write them out to files.  The
>comparisons again are Group A vs Group B, Group A vs Wild Type, and
>Group B vs Wild Type.  Below is my code, does it look correct? There
>are three main things I'd like to confirm are correct.  1 - The
>statistics calls.  2 - The contrasts matrix.  3 - Pulling out the
>pairwise comparisons at the end.
>
>The adjusted p-value is still the same (.999_) in this 3 group
>model.  So is the final conclusion that Group A and Group B are too
>similar to show significant difference with this sample size / study
>power?  And if I consider the unadjusted p-value, is the main
>concern a high false positive rate?
>
>Thank you,
>Ramzi
>
>My Code:
>
>## Load Packages
>library(affy)       # Affymetrix pre-processing
>library(limma)      # two-color pre-processing; differential expression
>library(Biobase)
>
>## Read targets file.
>pd <-
>read.AnnotatedDataFrame("Targets.txt",header=TRUE,row.names=1,as.is=TRUE)
>
>## Read .CEL data.
>rawAffyData <- ReadAffy(filenames=pData(pd)$FileName,phenoData=pd)
>
>## Normalize the data
>eset <- rma(rawAffyData)
>
>## The target file information can be recovered from the eset.
>targets <- pData(eset)
>
>## Define a design matrix.
>designMatrix <- model.matrix(~ -1 +
>factor(targets$Target,levels=unique(targets$Target)))
>colnames(designMatrix) <- unique(targets$Target)
>numParameters <- ncol(designMatrix)
>parameterNames <- colnames(designMatrix)
>
>## Define a contrasts matrix.
>## Three comparisons:
>## 1. Col vs Exp
>## 2. Col vs WT
>## 3. Exp vs WT
>contrastNames <- c(paste(parameterNames[1],parameterNames[2],sep="_vs_"),
>                     paste(parameterNames[1],parameterNames[3],sep="_vs_"),
>                     paste(parameterNames[2],parameterNames[3],sep="_vs_"))
>contrastsMatrix <- matrix(c(1,-1,0,1,0,-1,0,1,-1),nrow=ncol(designMatrix))
>rownames(contrastsMatrix) <- parameterNames
>colnames(contrastsMatrix) <- contrastNames
>
># This is my contrasts matrix.  Collapsed represents Group A,
>Expanding represents Group B, and WildType is. . . Wild Type.
>contrastsMatrix
>           Collapsed_vs_Expanding Collapsed_vs_WildType Expanding_vs_WildType
>Collapsed                      1                     1                     0
>Expanding                     -1                     0                     1
>WildType                       0                    -1                    -1
>
>
>## Fit to linear model.
>fit <- lmFit(eset,design=designMatrix)
>
>## Fit a linear model for the contrasts.
>fit2  <- contrasts.fit(fit,contrasts=contrastsMatrix)
>
>## Empirical Bayes statistics
>fit2  <- eBayes(fit2)
>
>numGenes <- nrow(eset)
>completeTable_Col_vs_Exp <-
>topTable(fit2,coef="Collapsed_vs_Expanding",number=numGenes)
>write.table(completeTable_Col_vs_Exp,file="Col_vs_Exp_genes.xls",sep="\t",quote=FALSE,col.names=NA)
>completeTable_Col_vs_WT <-
>topTable(fit2,coef="Collapsed_vs_WildType",number=numGenes)
>write.table(completeTable_Col_vs_WT,file="Col_vs_WT_genes.xls",sep="\t",quote=FALSE,col.names=NA)
>completeTable_Exp_vs_WT <-
>topTable(fit2,coef="Expanding_vs_WildType",number=numGenes)
>write.table(completeTable_Exp_vs_WT,file="Exp_vs_WT_genes.xls",sep="\t",quote=FALSE,col.names=NA)
>
>
>________________________________________
>From: Jenny Drnevich [drnevich at illinois.edu]
>Sent: Thursday, June 23, 2011 4:38 PM
>To: Abboud, Ramzi; bioconductor at r-project.org
>Subject: Re: [BioC] Comparing Two Affymetrix Arrays Question
>
>Hi Ramzi,
>
>There's nothing "wrong" with your results - there is simply not
>enough statistical evidence for differential expression between Group
>A and Group B. Is there a reason you are not following the
>limmaUserGuide() section 8.6, which goes through how to fit one model
>on 3 groups and pull out the pairwise comparisons you want? You'll
>get more power to detect differences between Groups A and B by
>fitting one model with all 3 groups instead of fitting separate models.
>
>Best,
>Jenny
>
>At 03:30 PM 6/23/2011, Abboud, Ramzi wrote:
> >Hello All,
> >
> >I am somewhat new to bioconductor, and am trying to accomplish what
> >I believe is a simple task.  I have 15 AffyMetrix gene signatures
> >from 15 subjects, each in the form of a .cel file.  The subjects are
> >grouped into 3 groups of 5 - let's say Group A, Group B, and Wild
> >Type.  I would like to compare the average gene expression of
> >subjects in Group A to those in Group B, and separately compare the
> >average gene expression in Group A to Wild Type.  In the results I
> >would like the genes most significantly different in expression
> >levels and the p-values for each gene comparison.
> >
> >I have code which I believe does this, but the results do not seem
> >totally correct, and I would like some help.
> >
> >I have two very similar R scripts (to keep things separate and
> >simple).  One compares Group A to Group B.  The other compares Group
> >A to Wild Type.  The results from comparing Group A to Wild Type
> >look correct.  However, the results from comparing Group A to Group
> >B give an adjusted p value of 0.999992827573282 for every single
> >gene.  Here are the top two lines from the output file (columns are
> >ID, logFC, AveExpr, t, P.Value, adj.P.Val, B) :
> >
> >42970   1458675_at      -0.402322454    4.770279273     -4.522575441
> >    0.000900789     0.999992828     -3.268563539
> >23121   1438815_at      0.437319401     7.866319701     4.013307606
> >    0.002098968     0.999992828     -3.417357338
> >
> >Obviously something is not right.  All the other numbers from the
> >Group A vs Group B comparison look reasonable, but this adjusted p
> >value is making me doubt the whole thing.
> >
> >Does someone see a glaring and obvious mistake in my code (which is
> >included below)?  Is there a better or simpler way to do comparison?
> >
> >Please let me know if I can provide any additional information.  I
> >would be happy to provide the excel
> >
> >Thank you,
> >Ramzi
> >
> >The following code compares Group A with Group B.  It is my R
> >Script, with notes.
> >
> >## Load Packages
> >library(affy)       # Affymetrix pre-processing
> >library(limma)      # two-color pre-processing; differential expression
> >library(Biobase)
> >
> >## Read targets file.
> >pd <-
> >read.AnnotatedDataFrame("TargetsAvsB.txt",header=TRUE,row.names=1,a
> s.is=TRUE)
> >
> >## Read .CEL data.
> >rawAffyData <- ReadAffy(filenames=pData(pd)$FileName,phenoData=pd)
> >
> >## Normalize the data
> >eset <- rma(rawAffyData)
> >
> >## The target file information can be recovered from the eset.
> >targets <- pData(eset)
> >
> >## Define a design matrix.
> >#designMatrix <- createDesignMatrix(eset)
> >designMatrix <- model.matrix(~ -1 +
> >factor(targets$Target,levels=unique(targets$Target)))
> >colnames(designMatrix) <- unique(targets$Target)
> >numParameters <- ncol(designMatrix)
> >parameterNames <- colnames(designMatrix)
> >
> >## Define a contrasts matrix.
> >#contrastsMatrix <- createContrastMatrix(eset, design=designMatrix
> >contrastNames <- c(paste(parameterNames[1],parameterNames[2],sep="_vs_"))
> >contrastsMatrix <- matrix(c(1,-1),nrow=ncol(designMatrix))
> >rownames(contrastsMatrix) <- parameterNames
> >colnames(contrastsMatrix) <- contrastNames
> >
> >## Fit to linear model.
> >fit <- lmFit(eset,design=designMatrix)
> >
> >## Empirical Bayes statistics
> >fitNoContrastsMatrix <- eBayes(fit)
> >
> >## Fit a linear model for the contrasts.
> >fit2  <- contrasts.fit(fit,contrasts=contrastsMatrix)
> >
> >## Empirical Bayes statistics
> >fit2  <- eBayes(fit2)
> >
> >numGenes <- nrow(eset)
> >completeTable_A_vs_B <- topTable(fit2,number=numGenes)
> >write.table(completeTable_A_vs_B
> >,file="A_vs_B_genes.xls",sep="\t",quote=FALSE,col.names=NA)
> >
> >_______________________________________________
> >Bioconductor mailing list
> >Bioconductor at r-project.org
> >https://stat.ethz.ch/mailman/listinfo/bioconductor
> >Search the archives:
> >http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>_______________________________________________
>Bioconductor mailing list
>Bioconductor at r-project.org
>https://stat.ethz.ch/mailman/listinfo/bioconductor
>Search the archives:
>http://news.gmane.org/gmane.science.biology.informatics.conductor