[BioC] edgeR: GLM & residuals and model fitting & hypothesis testing

Gordon K Smyth smyth at wehi.EDU.AU
Thu Feb 16 23:02:48 CET 2012


On Thu, 16 Feb 2012, Susanne Franssen wrote:

> Dear Gordon,
>
> thanks, a lot for you answer, however I am a little more uncertain now
> what to do.
>
>> 1) You may know that there are many different definitions of "residuals"
>> from a generalized linear models, and none of them have the properties of
>> residuals from normal linear models.  I have not found residuals very useful
>> myself in a genomic differential expression context.
>>
>> What sort of residuals did you want, and what were you planning to do with
>> them?
>
> Indeed I am not aware of this difference!
> I was thinking of the residual errors in a normal linear model. The
> problem in my analysis is that I only have 3 df (4 libs) to start
> with, so I cannot model the complete model
> individual+treatment+individual*treatment. So an idea was to model
> individual+treatment and afterwards do a model on the interaction only
> on the remaining residual errors as new observed vector.
> Is something like this at all possible or am I completely wrong here?

No, it is not possible.

Please read the section in the edgeR User's Guide "What to do if you have 
no replicates".

>> 2) You must always fit the model with all relevant factors:
>> individual*treatment.  You cannot do meaningful inference from a reduced
>> model until you have determined that the other factors are not important.
>> Hence you have to deal with the interaction first.
>
> It makes sense to have to put in all relevent factors to the model.
> However, concerning this I don't understand your comment - I have to
> deal with the interaction first, wouldn't the full model be:
> individual+treatment+individual*treatment (which I can't do because of
> a lack of df)

In R, the interaction is represented by individual:treatment.  The 
notation individual*treatment is a shorthand for interaction plus main 
effects, i.e., individual*treatment = 
individual+treatment+individual:treatment.

Please read the "Introduction to R" that comes with your R installation, 
in particular Section 11 on statistical models.

Best wishes
Gordon

> Cosidering my samples & factors:
>>> I have a fully crossed design with 2 factors and 2 factor levels each:
>>> individual <- as.factor(c("indA","indA","indB","indB"))
>>> treatment <- as.factor(c("treat1","treat2","treat1","treat2"))
>
> how would the cmd for the design matrix look like:
> design <- model.matrix(~??)
> and with which cmd would I fit the model and test for it:
> fit <- glmFit(d.GLM,design)
> lrt <- glmLRT(d.GLM,fit, coeff=?)
>
> Thanks a lot,
> Susanne
>
>
>
>>
>> Best wishes
>> Godon
>>
>>> Date: Tue, 14 Feb 2012 17:44:31 +0100
>>> From: Susanne Franssen <s.franssen at uni-muenster.de>
>>> To: bioconductor at r-project.org
>>> Subject: [BioC] edgeR: GLM & residuals and model fitting & hypothesis
>>>        testing
>>>
>>> Dear all,
>>>
>>>
>>> 1) GLM & residuals:
>>>
>>> I have a question concerning the use of GLMs in edgeR and the analysis
>>> of the residuals after model fitting.
>>>
>>> I have followed all the steps until model fitting, e.g.:
>>> glmfit.D <- glmFit(D, design, dispersion = D$tagwise.dispersion)
>>>
>>> The results I obtain from the fitting are the following catgories:
>>>>
>>>> names(glmfit.D)
>>>
>>> [1] "coefficients"  "fitted.values" "fail"          "not.converged"
>>> [5] "deviance"      "df.residual"   "abundance"     "design"
>>> [9] "offset"        "dispersion"    "method"        "counts"
>>> [13] "samples"
>>>
>>> What would be the best way to obtain the residuals for the "genewise"
>>> GLMs?
>>>
>>>
>>>
>>> 2) model fitting & hypothesis testing:
>>>
>>> I have a fully crossed design with 2 factors and 2 factor levels each:
>>> individual <- as.factor(c("indA","indA","indB","indB"))
>>> treatment <- as.factor(c("treat1","treat2","treat1","treat2"))
>>>
>>> in general I would be interested in 3 different aspects:
>>> a) effect of individual
>>> b) effect of treatment
>>> c) interaction between individual and treatment
>>>
>>> What would be the best way to test for those effects, would I rather
>>> test for all three aspects individually, i.e.:
>>> a) design <- model.matrix(~individual)
>>> b) design <- model.matrix(~treatment)
>>> c) design <- model.matrix(~individual*treatment)
>>>
>>> or doesn't it also make sense to model
>>> design <- model.matrix(~individual+treatment)
>>> and test for
>>> a) lrt.cd_ind <- glmLRT(D, glmfit.D, coef=2)
>>> b) lrt.cd_treat <- glmLRT(D, glmfit.D, coef=3)
>>> ... this way the effect of both factors could be accounted for in the
>>> model?!
>>>
>>>
>>> Thanks a lot!
>>> Susanne

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:5}}


More information about the Bioconductor mailing list