[BioC] Finding differentially expressed genes using Ebayes
j.m.herbert at bham.ac.uk
Mon May 11 17:55:32 CEST 2009
Thanks for all your suggestions and help, Sean, James and Saroj.
Before I saw the last two messages, I tried this based on the Limma manual and it seems to agree with Samr now.
design <- model.matrix(~ -1+factor(c(1,1,1,2,2,2)));
colnames(design) <- c("group1", "group2");
fit <- lmFit(eset, design);
contrast.matrix <- makeContrasts(group2-group1, levels=design);
fit2 <- contrasts.fit(fit, contrast.matrix);
fit2 <- eBayes(fit2);
topTable(fit2, coef=1, adjust="fdr")
I assume this another way of doing Jame's suggestion.
Thanks a lot again.
Molecular Angiogenesis group
First Floor, Institute of Biomedical Research
The University of Birmingham
B15 2TT (UK)
j.m.herbert at bham.ac.uk
Tel: +44 121 414 3733
Fax: +44 121 415 8677
From: Saroj K Mohapatra [mailto:saroj at vt.edu]
Sent: Mon 11/05/2009 14:42
To: John Herbert
Cc: bioconductor at stat.math.ethz.ch
Subject: Re: [BioC] Finding differentially expressed genes using Ebayes
I am trying to understand the situation. Creating a contrast matrix
(Sean's suggestion) is a very good idea. I usually do it that way.
For the current problem, the design matrix would do the following.
Assuming that the samples 1:3 belong to normal and 4:6 belong to cancer.
> design <- model.matrix(~factor(c(1,1,1,2,2,2)))
(Intercept) factor(c(1, 1, 1, 2, 2, 2))2
1 1 0
2 1 0
3 1 0
4 1 1
5 1 1
6 1 1
 0 1
attr(,"contrasts")$`factor(c(1, 1, 1, 2, 2, 2))`
Here 'design' constitutes the matrix X in the linear models:
y_g = X*beta_g + epsilon_g
where beta is a 2-vector of unknown parameters defining overall and
differential expression averages for gene g, y is the log(R/G) for gene
g, and epsilon_g is a zero-mean random disturbance
In this application, X is 6 x 2. for the first 3 samples on gene g we have,
y_g = beta1_g + epsilon_g
and for the 4th through the sixth samples we have
y_g = beta1_g + beta2_g + epsilon_g
beta1 is thus the average log ratio comparing the normal to pooled
reference, and beta2 is mean log(Cancer/Ref)-log(Normal/Ref) = mean
Therefore, I think, using the second coef in the function topTable would
produce result for the comparison of interest (cancer vs normal):
probes_interesting <- topTable(fit, coef=2, n=20);
Let me quickly add that I am relatively new to limma; please confirm my
approach or correct if I am wrong some where.
John Herbert wrote:
> I have 6 gpr files, 3 from cancer and 3 from normals. I have normalised them with printtiplowess and removed the control spots. My question is about finding differentially expressed genes with Ebayes. I have run the code below and get some strange results that are very different to using Samr. I am thinking I trust samr more as when I look at the results by eye, they seem to order the genes coffectly based on fold ratio. However, it could be that I am not using ebayes correctly.
> So, the design is supposed to represent cancer (the 1s) and normal (the 2s). Is this the correct way to construct a design matrix? I want to compare gene expression between cancer and normal (3 replicates each).
> design <- model.matrix(~factor(c(1,1,1,2,2,2)));
> fit <- lmFit(probes_only.imp, design);
> fit <- eBayes(fit);
> probes_only.imp_50_ebayes <- topTable(fit, n=20);
> Any help appreciated on this, thank you.
> Kind regards,
> Bioinformatics Officer
> Molecular Angiogenesis group
> First Floor, Institute of Biomedical Research
> The University of Birmingham
> Medical School
> B15 2TT (UK)
> j.m.herbert at bham.ac.uk
> Tel: +44 121 414 3733
> Fax: +44 121 415 8677
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor