[BioC] Comparing Two Affymetrix Arrays Question

Thu Jun 23 22:57:11 CEST 2011

Hi Ramzi,

Please keep all replies on the BioC list. Yes, it possible to have 
every single gene have the same adjusted p-value. You should look 
into how the False Discovery Rate correction works to see why this is 
the case. In short, it's because 0.999992828 was the lowest p-value 
of all genes after the first adjustment, but it was at the end of the 
ranked list, so all the genes above it also got this p-value.

Jenny

At 03:46 PM 6/23/2011, Abboud, Ramzi wrote:
>Jenny,
>
>Thank you so much for the fast reply.  I had used all 3 groups 
>originally, but simplified it because I was confused about this 
>p-value thing.  So you think it is possible that the adjusted 
>p-value for every single gene in the data set would have the same 
>value?  Is that 0.999992828 value simply the highest that is 
>returned by the function?
>
>I will revert to the 3 group model and repost once I have it working.
>
>Thanks,
>Ramzi
>________________________________________
>From: Jenny Drnevich [drnevich at illinois.edu]
>Sent: Thursday, June 23, 2011 4:38 PM
>To: Abboud, Ramzi; bioconductor at r-project.org
>Subject: Re: [BioC] Comparing Two Affymetrix Arrays Question
>
>Hi Ramzi,
>
>There's nothing "wrong" with your results - there is simply not
>enough statistical evidence for differential expression between Group
>A and Group B. Is there a reason you are not following the
>limmaUserGuide() section 8.6, which goes through how to fit one model
>on 3 groups and pull out the pairwise comparisons you want? You'll
>get more power to detect differences between Groups A and B by
>fitting one model with all 3 groups instead of fitting separate models.
>
>Best,
>Jenny
>
>At 03:30 PM 6/23/2011, Abboud, Ramzi wrote:
> >Hello All,
> >
> >I am somewhat new to bioconductor, and am trying to accomplish what
> >I believe is a simple task.  I have 15 AffyMetrix gene signatures
> >from 15 subjects, each in the form of a .cel file.  The subjects are
> >grouped into 3 groups of 5 - let's say Group A, Group B, and Wild
> >Type.  I would like to compare the average gene expression of
> >subjects in Group A to those in Group B, and separately compare the
> >average gene expression in Group A to Wild Type.  In the results I
> >would like the genes most significantly different in expression
> >levels and the p-values for each gene comparison.
> >
> >I have code which I believe does this, but the results do not seem
> >totally correct, and I would like some help.
> >
> >I have two very similar R scripts (to keep things separate and
> >simple).  One compares Group A to Group B.  The other compares Group
> >A to Wild Type.  The results from comparing Group A to Wild Type
> >look correct.  However, the results from comparing Group A to Group
> >B give an adjusted p value of 0.999992827573282 for every single
> >gene.  Here are the top two lines from the output file (columns are
> >ID, logFC, AveExpr, t, P.Value, adj.P.Val, B) :
> >
> >42970   1458675_at      -0.402322454    4.770279273     -4.522575441
> >    0.000900789     0.999992828     -3.268563539
> >23121   1438815_at      0.437319401     7.866319701     4.013307606
> >    0.002098968     0.999992828     -3.417357338
> >
> >Obviously something is not right.  All the other numbers from the
> >Group A vs Group B comparison look reasonable, but this adjusted p
> >value is making me doubt the whole thing.
> >
> >Does someone see a glaring and obvious mistake in my code (which is
> >included below)?  Is there a better or simpler way to do comparison?
> >
> >Please let me know if I can provide any additional information.  I
> >would be happy to provide the excel
> >
> >Thank you,
> >Ramzi
> >
> >The following code compares Group A with Group B.  It is my R
> >Script, with notes.
> >
> >## Load Packages
> >library(affy)       # Affymetrix pre-processing
> >library(limma)      # two-color pre-processing; differential expression
> >library(Biobase)
> >
> >## Read targets file.
> >pd <-
> >read.AnnotatedDataFrame("TargetsAvsB.txt",header=TRUE,row.names=1,a 
> s.is=TRUE)
> >
> >## Read .CEL data.
> >rawAffyData <- ReadAffy(filenames=pData(pd)$FileName,phenoData=pd)
> >
> >## Normalize the data
> >eset <- rma(rawAffyData)
> >
> >## The target file information can be recovered from the eset.
> >targets <- pData(eset)
> >
> >## Define a design matrix.
> >#designMatrix <- createDesignMatrix(eset)
> >designMatrix <- model.matrix(~ -1 +
> >factor(targets$Target,levels=unique(targets$Target)))
> >colnames(designMatrix) <- unique(targets$Target)
> >numParameters <- ncol(designMatrix)
> >parameterNames <- colnames(designMatrix)
> >
> >## Define a contrasts matrix.
> >#contrastsMatrix <- createContrastMatrix(eset, design=designMatrix
> >contrastNames <- c(paste(parameterNames[1],parameterNames[2],sep="_vs_"))
> >contrastsMatrix <- matrix(c(1,-1),nrow=ncol(designMatrix))
> >rownames(contrastsMatrix) <- parameterNames
> >colnames(contrastsMatrix) <- contrastNames
> >
> >## Fit to linear model.
> >fit <- lmFit(eset,design=designMatrix)
> >
> >## Empirical Bayes statistics
> >fitNoContrastsMatrix <- eBayes(fit)
> >
> >## Fit a linear model for the contrasts.
> >fit2  <- contrasts.fit(fit,contrasts=contrastsMatrix)
> >
> >## Empirical Bayes statistics
> >fit2  <- eBayes(fit2)
> >
> >numGenes <- nrow(eset)
> >completeTable_A_vs_B <- topTable(fit2,number=numGenes)
> >write.table(completeTable_A_vs_B
> >,file="A_vs_B_genes.xls",sep="\t",quote=FALSE,col.names=NA)
> >
> >_______________________________________________
> >Bioconductor mailing list
> >Bioconductor at r-project.org
> >https://stat.ethz.ch/mailman/listinfo/bioconductor
> >Search the archives:
> >http://news.gmane.org/gmane.science.biology.informatics.conductor