[BioC] Limma issues.
Marcus
marcusb at biotech.kth.se
Tue Nov 11 21:51:51 MET 2003
Hi limma users.
I have some general questions regarding the use of dupcor.series and
gls.series functions.
I use an experimental design that compares three different conditions to a
fourth (which I will call a reference).
So the design of the experiment looks something like:
Cond 1 Cond2 Cond2
Ref
(There are supposed to be double arrows between the different conditions
and the reference.)
The experiment is done in four (and between some conditions, five)
biological replicates, so in total I have made 4 times 6 (dye-swap) = 24
hybridizations which compares all the conditions with the reference, plus
four additional hybridizations which is a dye-swap between cond2 and
reference plus cond3 and the reference. This gives in total 27 slides since
one slide was excluded due to a terrible hybridization. Each slide has a
duplicate spot.
This data has been filtered using different filter cut-offs and the
remaining data has been put into a matrix of M-values. The data has been
normalized and dye-swapped.
From this point I have made two different analysis paths.
One analysis path was to divide the M-matrix into three different matrices
which contains data from the different conditions and I continued using the
matrices with the following
command: dupcor.series(M.matrix.cond1,design,ndups=2,spacing=13824)->corr
Where the design was something like: design <- c(1,1,1,1,1,1,1,1)
The correlation is later used in the function: gls.series(M.matrix.cond1,
design, ndups=2,spacing=13824,corr$cor)->fit
and continuing with: eb<-ebayes(fit)
the data from ebayes is later used in the toptable
function: toptable(number=100,genelist=list[,],fit=fit,eb=eb,adjust="fdr")
When I make this analysis I get a list of 82 genes for cond1 that has a
B-value larger than zero.
But if I use another strategy where I use all the slides in the same
analysis and use a design matrix which looks something like:
Con1 Con2 Con3
1 1 0 0
2 1 0 0
3 1 0 0
4 1 0 0
5 1 0 0
6 1 0 0
7 1 0 0
8 1 0 0
9 0 1 0
10 0 1 0
11 0 1 0
12 0 1 0
13 0 1 0
14 0 1 0
15 0 1 0
16 0 1 0
17 0 1 0
18 0 1 0
19 0 0 1
20 0 0 1
21 0 0 1
22 0 0 1
23 0 0 1
24 0 0 1
25 0 0 1
26 0 0 1
27 0 0 1
And make the function calls like the procedures above except the toptable
where I specify the coef number to be the condition which I want to
investigate. The thing is, when I use this approach I get more genes that
have a b-value greater than zero, namely 110.
My guess is that when I run the dupcor.series the correlations is based on
all the slides in some way, since corr$cor is a lonely number and not a
correlation for every comparison in the experimental design. The cor
numbers from the different analysis pathways are not equal either.
When I run the gls.series function I get a fit$sigma that is based on all
the slides or?
What is actually happening when the sigma is calculated. The sigma is also
generally smaller when all the slides are included compared to the analysis
path where I used one condition at a time, which I guess should be the case
if all slides are included.
The lods which is calculated in the ebayes function is larger then the lods
which originates from the analysis path where I examined one condition at a
time, so this result in more genes that is larger than zero.
So to my questions (sorry for the long introduction):
Is the design matrix correct when the experiment is performed the way it
is? According to the user guide, this is an optional way when you have a
reference design but since the result is deviating from the one comparison
at a time I am not sure.
If the design is correct, which of the two analysis pathways is the
correct/better one?
I dont consider myself as a statistician, more like a biologist, but I
reckon the sigma factor as a crucial variable in these analyses. I changed
the fit$sigma for the "one condition" analysis path to the sigma that
originates from the analysis that contains all the slides and the number of
genes that reach above B-value zero increased to 111 which is quite like
the 110 that I got from the other analysis pathway. So I guess that the
sigma factor is quite responsible for the B-value spectra.
If I would like to compare, lets say the cond2 with cond3 can I then add a
another column to the design matrix, and if so, how would it look like.
During these analysis I have used R-version 1.8 and limma version 1.3.1
I would be very grateful for some response.
Regards
Marcus
*******************************************************************************************
Marcus Gry Björklund
Royal Institute of Technology
AlbaNova University Center
Stockholm Center for Physics, Astronomy and Biotechnology
Department of Molecular Biotechnology
106 91 Stockholm, Sweden
Phone (office): +46 8 553 783 45
Fax: + 46 8 553 784 81
Visiting adress: Roslagstullsbacken 21, Floor 3
Delivery adress: Roslagsvägen 30B
More information about the Bioconductor
mailing list