[BioC] Questions about model matrix, logFC and adjusted P.value of t-test
Gordon K Smyth
smyth at wehi.EDU.AU
Thu Oct 28 05:02:32 CEST 2010
Dear Mingkwan,
> Date: Wed, 27 Oct 2010 00:52:22 +0200
> From: Mingkwan.Nipitwattanaphon at unil.ch
> To: bioconductor at stat.math.ethz.ch
> Subject: [BioC] Questions about model matrix, logFC and adjusted
> P.value of t-test
>
> Dear Bio-C users,
>
> My samples are queen ants. I use two-color spotted microarrays,
> hybridisation against a reference, no dye swaps. I would like to compare
> samples with different genotypes, developmental stages and social form.
>
> There are 3 genotypes: 1. BB (D=dominant), 2. Bb (H=heterozygous) and 3.
> bb (R=recessive).
>
> Three developmental stages: 1. young (2d) virgin queen, 2. mature (11d)
> virgin queen, 3. mated/mother queen (mom)
>
> From these 3 genotypes, 3 developmental stages and 2 types of social
> form, I can group my 99 slides into 9 different categories. These slides
> contain two batches (I and J series). I treated batch effect as a fixed
> effect. My model matrix is:
>
> design <-
> model.matrix(~0+factor(targets$Cy3)+factor(targets$batch))
>
> colnames(design) <- c("P2dD", "P2dH", "P11dD", "P11dH",
> "P11dR", "M2dD", "M11dD", "MomD", "MomH", "batch")
>
> design
> P2dD P2dH P11dD P11dH P11dR M2dD M11dD MomD MomH batch
> 1 0 0 0 0 0 1 0 0 0 0
...
> 99 0 0 0 1 0 0 0 0 0 1
>
> fitMA <- lmFit(normalizedMA, design)
>
> There are 36 possible tests that I can make but I am only interested in
> 16 tests below.
>
> contrastALL16 <- makeContrasts(P2dD-P2dH, P11dD-P11dH, P11dH-P11dR,
> P11dD-P11dR, M2dD-P2dD, M11dD-P11dD, MomD-MomH, P2dD-P11dD, P11dD-MomD,
> P2dD-MomD, P2dH-P11dH, P11dH-MomH, P2dH-MomH, M2dD-M11dD, M11dD-MomD,
> M2dD-MomD, levels=design)
>
> fitContrast.MA <- contrasts.fit(fitMA, contrastALL16)
> fit_eBayes_MA <- eBayes(fitContrast.MA)
>
> write.table(fit_eBayes_MA, file="Q16contrasts.txt",
> sep="\t")
>
> My result file contains coefficients of all the 16 contrasts that I
> asked for and the p-value of each contrast (each t-test) but NOT the
> adjusted p-value. It also gives me the F value and the p-value of the
> F-test and again NOT the adjusted p-value of the F-test.
write.table() is general purpose rather than a limma command. You could
use write.fit(), which would give you adjusted p-values.
> I can get the adjusted p-value from the F-test by using the command
> "p.adjust" but not with the t-test. When I used the command "topTable"
> with coeff=1 (until 16 each time for all of my 16 contrasts), I can get
> the adjusted p-value of each contrast.
>
> My questions are:
>
> 1. Why does not the command "eBayes" give adjusted p-value?
Before you can compute adjusted p-values, you need to specify what is the
universe of tests that you want to adjust over. eBayes is concerned with
computations that apply to any subsequent testing that you choose to do.
> Is there an easier or more direct way to get adjusted p-value of the
> t-test?
If topTable() is not sufficient for you, why not decideTests() or
write.fit()? What are you planning to do with the adjusted p-values?
> 2. How does the logFC calculate from?
As I'm sure you already know, limma estimates the logFCs by fitting a
linear model.
> If I took the M-values for a single spot, after normalisation between
> arrays from slides that are belonged to one of my contrasts (8 slides of
> momD vs 8 slides of momH, because this case all slides are from the same
> batch)
>
> M-value of momD slide 1 to 8 = 3.00, 3.26, 2.73, 3,32, 2.93,
> 2.81, 2.55, 2.85
> M-value of momH slide 1 to 8 = -0.44, -0.54, 0.03, -0.38,
> 0.49, -0.56, 0.07, 0.37
>
> The mean M-value is -0.12 for momH and 2.93 for momD. I'd expect that
> the relative expression level in momD compared to momH would be:
> (2^(2.93))/(2^(-0.12)) = 8.29. The logFC should simply be: log2(8.29) =
> 3.05.
This simple calculation would be appropriate only if momD and momH were
the only conditions in your experiment. But you have batch effects,
genotypes and other variables. Taking these other factors into account
will generally affect the fold-change etc.
> However, the logFC given by limma is 3.37.
Which is presumably the correct value after correcting for the batch
effect and adjusting for the effects of other variables in your model.
> 3. If I want to do more complicated model like: ~1 + Age + Geno type
> *nested within* Social form + Age : Genotype *nested within* Social
> form, (fixed factor = Batch). Is it possible to do? How can I do it in
> limma?
limma can work with any linear model formula that is understood by R. For
example you can nest Age and Genotype within Social form by
design <- model.matrix(~ (Age+Genotype) %in% SocialForm)
See page 51 of the manual "An Introduction to R", which comes as part of
the R installation.
Best wishes
Gordon
> I am really sorry for writing such a long email because I want to make
> everything clear. I really appreciate your help.
>
> best regards,
> Mingkwan
______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}
More information about the Bioconductor
mailing list