[BioC] Analysing Affy data with limma

Thu Dec 12 16:20:30 CET 2013

Hi January,

I think what you are doing is probably fine. From the paper:

"Affymetrix CEL files were imported, summarized using the GC-RMA 
algorithm, and baseline-normalized to the median of all arrays."

So that's pretty close to what you would get if you just took the 
celfiles yourself and ran them through affy or gcrma (or xps or frma or 
SCAN.UPC), except for the location change from shifting the data by the 
median. And I wouldn't be too surprised by very small p-values because 
you have a) 27 samples/group, and b) they have tuberculosis, which I 
would expect to get the various white blood cells really worked up.

However, I believe you are ignoring the fact that these data are paired, 
and should be including a 'patient' factor to account for the pairing.

Best,

Jim

On 12/12/2013 9:50 AM, January Weiner wrote:
> Dear all,
>
> I'm trying to analyse a data set reposited in GEO. In theory, I could
> download the CEL files and do everything from scratch, but I would prefer
> to use the GEO-reposited, normalized data.
>
> I load the data with
>
> library( GEOquery )
> g <- getGEO( "GSE31348" )[[1]]
>
> So far, so good. The data is supposed to be GCrma-normalized. Density plots
> show that the arrays have a symmetrical distribution around 0, so they are
> not logarithmized signals.
>
> My question is how should I proceed with the data if I want to use limma to
> analyse it? Or should I use the original CEL files and a standard approach
> (as described, for example, in the limma guide)?
>
> If I dumbly run limma, I am getting more or less the results I expect, but
> the p-values seem to me overly optimistic:
>
> g2 <- new( "EList", list( targets= as( phenoData( g ), "data.frame" ),
> genes= as( featureData( g ), "data.frame" ), E= exprs( g ) ) )
> g2$targets$tp <- factor( gsub( "time point: Week ", "tp.",
> g2$targets$characteristics_ch1.2 ) )
> d <- model.matrix( ~ 0 + g2$targets$tp )
> colnames( d ) <- levels( g2$targets$tp )
> fit <- eBayes( contrasts.fit( lmFit( g2, d ), makeContrasts( "tp.0-tp.26",
> levels= d ) )
>
> I think that this is incorrect.
>
> Kind regards,
> j.
>

-- 
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099