[BioC] robust linear model in anovas

Arne.Muller at aventis.com Arne.Muller at aventis.com
Tue Apr 6 17:52:03 CEST 2004


Hello,

I'm trying to analyze some toxicology data, and I got some problems with lm
and rlm for which I'm looking for help. The experiment is a 3 factor design:

- the value (the y vector) is a response for a treatment with a drug
- a dose factor with 5 levles (a drug in 5 concentrations)
- a time factor with 2 levels (time of treatment)
- a batch factor with 3 levels corresponding to the laboratories that
analyzed the data

Within each dose, time and batch I've 3 "technical" replicates. I'm looking
for genes for which the expression is altered mainly by the dose at any of
the time points.

Ideally I'd just like to merge the 3 replicates produced by each lab into one
group with then 9 replicates (i.e. omitting the batch factor). Unfortunately
the batch factor is very strong and introduces quite a large variance within
each time and dose level.

I know that sometimes one laboratory is quite a strong outlier, but the
"extreme" lab changes from gene to gene! When looking at a stripchart for the
dose (within one timepoint) and merged batches the datapoints are scattered,
and often 2 or 3 of the replicates from one lab are >2 standard deviations
away from the mean.

I'm doing a simple lm to quantify the main effects (creating an lm for each
gene).

fit <- lm(value ~ dose + time + batch, d)

> summary(fit)$coefficients
                Estimate Std. Error     t value     Pr(>|t|)
(Intercept)  6.772575997 0.04040316 167.6249085 2.174127e-99
dose010mM   -0.038755967 0.04435994  -0.8736704 3.850501e-01
dose025mM   -0.028481675 0.04598919  -0.6193123 5.375628e-01
dose050mM   -0.069415506 0.05087185  -1.3645171 1.764316e-01
dose100mM    0.008306756 0.04435994   0.1872580 8.519573e-01
time24h      0.015798762 0.02976119   0.5308511 5.970699e-01
batchOLD     0.134205422 0.03534714   3.7967832 2.930010e-04
batchPRG     0.040506343 0.03837539   1.0555291 2.945274e-01

> anova(fit)
Analysis of Variance Table

Response: value
          Df  Sum Sq Mean Sq F value   Pr(>F)   
dose       4 0.06086 0.01521  0.8180 0.517660   
time       1 0.00524 0.00524  0.2818 0.597070   
batch      2 0.27889 0.13944  7.4968 0.001068 **
Residuals 76 1.41362 0.01860                    
---
Signif. codes:  0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 
> anova(rfit)
Analysis of Variance Table


To minumize the batch effect, e.g. to find true influences of the dose I
thought it may be a good idea to exclude obvious outliers. I've done this by
excludung data point >2 SD from the mean within a dose at a timepoint. I
guess that's not realy elegant ... .

Would it be apporpiate to use a robust lm (rlm or lqs) to "downgrade" the
outliers in importance? Could a robutsly fitted linear model be used for an
anova, e.g. do weight the data within the groups? E.g.

rfit <- rlm(value ~ dose + time + batch, d)

Why are there no p-values given for the t-statistics?

> summary(rfit)$coefficients
                   Value Std. Error     t value
(Intercept)  6.781613686 0.04215968 160.8554538
dose010mM   -0.038060320 0.04628847  -0.8222418
dose025mM   -0.042400425 0.04798856  -0.8835528
dose050mM   -0.078787526 0.05308349  -1.4842192
dose100mM    0.008569384 0.04628847   0.1851300
time24h      0.005742874 0.03105505   0.1849256
batchOLD     0.140608264 0.03688384   3.8121911
batchPRG     0.031820089 0.04004375   0.7946331

... and no p-values for the F-stats/F-values either ...  

> anova(rfit)
Analysis of Variance Table

Response: value
          Df  Sum Sq Mean Sq F value Pr(>F)
dose       4 0.07794 0.01949               
time       1 0.00165 0.00165               
batch      2 0.29484 0.14742               
Residuals    1.42161   


lqs doesn't have any implementation for anova, is it theoretically not
possible to use the model for an anova?

As an alternative I'm thinking about using the "weights" option for lm. Maybe
the distance from the mean in units of standard deviations

  w <- abs(mean(i) - i) / sd(i)
  w <- 1/w  # to give small weight to outliers

I'd be happy to discuss these options with you.

	kind regards + thanks a lot for your comments,

	Arne

--
Arne Muller, Ph.D.
Toxicogenomics, Aventis Pharma
arne dot muller domain=aventis com



More information about the Bioconductor mailing list