[BioC] Breaking the "most genes not differentially expressed" assumption

Paolo Innocenti paolo.innocenti at ebc.uu.se
Mon Apr 27 18:22:34 CEST 2009


Thanks for the quick reply,
here is my code, run in a fresh session:

require(affy)
miame <- read.MIAME("../stat_input_rawdata/hemiarray.miame")
phenodata <- read.AnnotatedDataFrame(
	"../stat_input_rawdata/hemiarray.phenodata",sep=" ")
data <- ReadAffy(
	filenames=phenodata$filenames,
	sampleNames=sampleNames(phenodata),
	phenoData=phenodata,
	description=miame,
	celfile.path="../stat_input_rawdata")
eset <- rma(data)
require(limma)
f.sex <- factor(pData(data)$sex)
design <- model.matrix(~0 + f.sex)
colnames(design) <- levels(f.sex)
fit <- lmFit(eset,design)
contrast <- makeContrasts(M-F, levels=design)
fit2 <- contrasts.fit(fit,contrast)
fit2 <- eBayes(fit2)
pval <- 0.0001
adjusted <- "fdr"
res <- decideTests(fit2,
	adjust=adjusted,
	method="separate",
	p.value=pval)

My results:

 > summary(res)
    M - F
-1 11653
0   1442
1   5857

And my sessioInfo():

 > sessionInfo()
R version 2.8.0 (2008-10-20)
x86_64-unknown-linux-gnu

locale:
LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=C;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C

attached base packages:
[1] tools     stats     graphics  grDevices utils     datasets  methods
[8] base

other attached packages:
[1] affy_1.20.0   Biobase_2.2.1 limma_2.16.3

loaded via a namespace (and not attached):
[1] affyio_1.10.1        preprocessCore_1.4.0

----
Best,
paolo


Sean Davis wrote:
> 
> 
> On Mon, Apr 27, 2009 at 11:25 AM, Paolo Innocenti 
> <paolo.innocenti at ebc.uu.se <mailto:paolo.innocenti at ebc.uu.se>> wrote:
> 
>     Hi all,
> 
>     I have dataset of 120 Affy arrays, 60 males and 60 females.
>     The expression profiles of the 2 groups differs dramatically, i.e.
>     if I run a standard RMA + limma, I have ~90% of the genes
>     differentially expressed. 
> 
> 
> You'll probably want to provide the code that you used to perform the 
> RMA and limma analyses.  It is difficult to comment on the results of an 
> analysis without seeing what was done to produce them.  Also, be sure to 
> provide sessionInfo() in case there are questions about what packages 
> are used. 
> 
> Sean
> 
>  
> 
>     Also, downregulated genes are twice as many than upregulated genes,
>     although if I impose a cutoff of two-fold difference in expression,
>     they are almost equal (15% up and 15% down).
>     This is clearly breaking the assumption that most of the genes on
>     the array should not be differentially expressed, but the result is
>     in line with the current knowledge of sex-biased gene expression in
>     my model organism.
> 
>     I have done some quality control plots, available here:
>     - Boxplot:
>     http://www.iee.uu.se/zooekol/pdf/hemiarray_qc_boxplot.pdf
> 
>     - Frequency histogram:
>     http://www.iee.uu.se/zooekol/pdf/hemiarray_qc_histogram.pdf
> 
>     - RLE and NUSE plots:
>     http://www.iee.uu.se/zooekol/pdf/hemiarray_qc_RLEandNUSE1.pdf
> 
>     - CorrelationPlot:
>     http://www.iee.uu.se/zooekol/pdf/hemiarray_qc_correlationplot.pdf
> 
>     - PCA, after RMA normalization:
>     http://www.iee.uu.se/zooekol/pdf/hemiarray_qc_pca.pdf
> 
>     Now, my questions are:
>     1) Is my issue really a issue? If so, how can I perform a robust
>     normalization of my arrays?
> 
>     2) Is there a tool to assess how "robust" your pre-processing method
>     is in respect to this issue?
> 
>     3) Sex-biased gene expression is not the only biological question in
>     my experiment. Is the massive size of this effect going to affect
>     the "detectability" of other smaller effects? (through normalization
>     or correction for multiple testing or other?)
> 
>     Thanks,
>     paolo
> 
> 
>     -- 
>     Paolo Innocenti
>     Department of Animal Ecology, EBC
>     Uppsala University
>     Norbyvägen 18D
>     75236 Uppsala, Sweden
> 
>     _______________________________________________
>     Bioconductor mailing list
>     Bioconductor at stat.math.ethz.ch <mailto:Bioconductor at stat.math.ethz.ch>
>     https://stat.ethz.ch/mailman/listinfo/bioconductor
>     Search the archives:
>     http://news.gmane.org/gmane.science.biology.informatics.conductor
> 
> 

-- 
Paolo Innocenti
Department of Animal Ecology, EBC
Uppsala University
Norbyvägen 18D
75236 Uppsala, Sweden



More information about the Bioconductor mailing list