# [BioC] limma, subsets of design matrix and p-values

Mike Schaffer mschaff at bu.edu
Thu Mar 10 20:15:29 CET 2005

```I've run limma for a few months and had a question about the p-values
being calculated from a large set of data vs. just a subset.

All of my 2-color array data is relative to an untreated sample and
each is replicated three times.
For example:
Treatment1 vs untreated x 3,
Treatment2 vs. untreated x 3,
Treatment3 vs. untreated x 3
...etc.

So I have a large MA object of all the data that is normalized within
arrays and a design matrix created by:

design <- modelMatrix(targets,ref="untreated")

My confusion stems from the fact that if I run eBayes on the entire
dataset (code below), I get different p-values (but same M values) for
the TreatmentX vs. untreated, than if I fit the subset that only
includes data for one treatment (e.g. just Treatment 1 vs. untreated).

For example,

fit <- lmFit(MA,design=design)
eb <- eBayes(fit)
eb\$p.values[1:10,1]

gives different p-values than if I were to only initially subset on the
Treatment1 vs. untreated data, and then run lmFit.
For example,

design <- modelMatrix(targets[1:3,],ref="untreated")
fit <- lmFit(MA[,1:3],design=design)       # the first three data sets
include ALLof the Treatment1 vs. untreated data
eb <- eBayes(fit)
eb\$p.values[1:10]

Am I incorrect to assume that the p-values should be the same
regardless of how much data is included in the MA object, as long as
the design matrix has no overlap between experiments (e.g. treatment1
vs. untreated data is distinct from treatment2 vs. untreated data) --
aside from the fact they are all relative to an untreated sample?

Or is the moderated t-statistic based on ALL of the data in the MA
object regardless of each experiment's relationship to others?

I'd like to read in all the data and keep it in one large RG and MA
object.  If I'm looking to determine the p-values for genes induced
relative to untreated by just one of the treatments, should I use lmFit
on the large MA object or should it be subset first (e.g MA[,1:3])
before doing the linear fit?

Am I missing something?

Thanks, in advance, for any help.

```