[BioC] EdgeR for proteomics
Mark Robinson
mark.robinson at imls.uzh.ch
Fri Oct 19 07:25:45 CEST 2012
Dear Fabricio,
A couple quick comments below …
On 15.10.2012, at 17:41, Fabricio Marchini wrote:
> Thank you Mark,
>
> My absence of clearness is because I'm having trouble to figure out how to deal with the data.
>
> at your first question, actually there is a missing command line:
>
> > disp <- estimateCommonDisp(b)
> > disp$common.dispersion = 0.0001004979
>
> > disp <- estimateGLMCommonDisp(b)
> > disp$common.dispersion = 3.999943
Read the docs for estimateGLMCommonDisp -- ?estimateGLMCommonDisp
You'll want to specify your design matrix here in the call the estimateGLMCommonDisp(), otherwise it calculates dispersion relative to just an intercept model (i.e. all groups have same mean). That is probably why your estimate is so high.
Do you have real biological replicates here? Given the very low dispersion with estimateCommonDisp(), I would guess no. In that case, read the edgeR users guide -- 2.9 What to do if you have no replicates.
> I was doing some tests about dispersions, I'm still not certain wich one should I use.
>
> The second question:
>
> I have 2 comtrols:
> MO6h MO24h (these are my controls through time)
> and 2 treatments for both times:
> La6h La24h (La treatment)
> Lm6h Lm24h (Lm treatment)
>
> My biological questions are, 1) do I have modulation for each treatment through time and 2) there are differences for the same time for both treatments?
> I do not know how should I use my controls for both situations. For the situation that I'm looking for each time point I could use a intercept design for each time point, like:
>
> design <- model.matrix(~group, data=c$samples)
The comment from my previous message still applies -- split your variables into treatment and time and create a design matrix (and maybe contrast matrix) according to the questions you want to answer. Have a look at the limma/edgeR user's guide for some examples, or ask a local statistician.
Best regards, Mark
>
> But I do not know if that is correct and how to analyse each treatment dealing with the controls. The best situation would be to deal first with the controls and them the treated samples through time, any ideia?
>
> --
> Fabricio K. Marchini
> Adjunct Researcher
>
> Bioinformatics and Computational Biology Laboratory
> Functional Genomics Laboratory
> CV - CNPq / Web site
>
> Instituto Carlos Chagas / FIOCRUZ-PR
> Phone +55 41 3316 3236
> Fax +55 41 3316 3267
> R. Prof. Algacyr Munhoz Mader, 3775 -CIC
> Curitiba - PR - Brasil, 81350-010
>
>
> On Wed, Oct 10, 2012 at 5:40 PM, Mark Robinson <mark.robinson at imls.uzh.ch> wrote:
> Hi Fabricio,
>
> I suggest you check (at least) 2 things:
>
> 1.
> > disp <- estimateCommonDisp(b)
> > disp$common.dispersion = 0.0001004979
>
> > disp$common.dispersion = 3.999943
>
> Your example only makes 1 call to estimateCommonDisp(), but you have 2 drastically different values. Are you reporting these as the estimated values, or are you actually running this command and *setting* the common dispersion? It's not clear from your message.
>
> You may also want to study some of the GLM-based case studies in:
> http://www.bioconductor.org/packages/2.11/bioc/vignettes/edgeR/inst/doc/edgeRUsersGuide.pdf
>
> For example, the standard GLM work flow would be similar to that on Page 7.
>
>
> 2.
> > lrt <- glmLRT(b,fit,coef=fit$design)
>
> The docs for the 'coef' argument (?glmLRT) say:
> ----
> coef: integer or character vector indicating which coefficients of
> the linear model are to be tested equal to zero. Values must
> be columns or column names of ‘design’. Defaults to the last
> coefficient. Ignored if ‘contrast’ is specified.
> ----
> As you can see, the function is expecting something very different to what give as your 'coef' argument. Maybe you want 'coef=2:6', if you are looking for any difference between your 6 groups. Of course, maybe you actually want to split your factors into 2 … one of ("La","Lm","MO") and one of ("6h","24h") and construct a design matrix accordingly. But, this is also not clear from your message.
>
> Hope that helps,
> Mark
>
>
> ----------
> Prof. Dr. Mark Robinson
> Bioinformatics
> Institute of Molecular Life Sciences
> University of Zurich
> Winterthurerstrasse 190
> 8057 Zurich
> Switzerland
>
> v: +41 44 635 4848
> f: +41 44 635 6898
> e: mark.robinson at imls.uzh.ch
> o: Y11-J-16
> w: http://tiny.cc/mrobin
>
> ----------
> http://www.fgcz.ch/Bioconductor2012
>
>
>
> On 10.10.2012, at 06:41, Fabricio Marchini wrote:
>
> > Hi,
> >
> > I'm using EdgeR to analyse a proteomic data with peptide counting. I have
> > limited experience on R/EdgeR/Statistics so I appreciate some help.
> > Using the follow code:
> >
> > a=file[,2:64]
> >
> > b=DGEList(counts=a,group=rep(c("La6h","La24h","Lm6h","Lm24h","MO6h","MO24h"
> > ),c(10,11,10,11,10,11)), lib.size=colSums(a))
> >
> > b <- calcNormFactors(b)
> >
> > times <- rep(c("La6h","La24h","Lm6h","Lm24h","MO6h","MO24h"),c(10,11,10,11,
> > 10,11))
> >
> > times <- factor(times,levels=c("La6h","La24h","Lm6h","Lm24h","MO6h","MO24h"
> > ))
> >
> > design <- model.matrix(~factor(times))
> >
> > disp <- estimateCommonDisp(b)
> >
> > fit <- glmFit(b,design,dispersion=disp$common.dispersion)
> >
> > lrt <- glmLRT(b,fit,coef=fit$design)
> > disp$common.dispersion = 0.0001004979
> >
> > All proteins (3430) had a p.value of 0.
> >
> > I tried also with
> >
> > fit <- glmFit(b,design,dispersion=disp$common.dispersion)
> >
> > lrt <- glmLRT(b,fit,coef=fit$design)
> > disp$common.dispersion = 3.999943
> >
> > and that gave me all the proteins with p.value lower than 6.29E-05.
> >
> > That gave a signal that I'm doing something wrong or because of both common
> > dispersions my data is not a appropriate for the analysis.
> >
> > Any suggestions or corrections?
> >
> > --
> > Fabricio K. Marchini
> >
> > [[alternative HTML version deleted]]
> >
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor at r-project.org
> > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
More information about the Bioconductor
mailing list