[R] Example data for analysis of a highly pseudoreplicate mixed-effects experiment

Matthias Gralle matthias_gralle at eva.mpg.de
Wed Sep 16 11:10:42 CEST 2009


Hello everybody,

it may be better to have sample data. I have provided data with less 
levels of "gene" and "day" and only ca. 400 data points per condition.

Sample code:
small=as.data.frame(read.csv("small.csv"))
small$species=factor(small$species)
small$gene=factor(small$gene)
small$day=factor(small$day)
small$repl=factor(small$repl)
library(lme4)
model1=lmer(APC~(FITC+species+gene)^3+(1|day)+(1|repl),REML=F,data=small)
model2=lmer(APC~(FITC+species+gene)^2+(1|day)+(1|repl),REML=F,data=small)
anova(model1,model2)

gives me a significant difference, but in fact there are far too many 
residual df, and this is much worse in the original data. I suppose I 
cannot trust this difference.

model3=lmer(APC~species*gene+(1|day/repl/FITC),data=small)
Error: length(f1) == length(f2) is not TRUE
In addition: Warning messages:
1: In FITC:(repl:day) :
  numerical expression has 886 elements: only the first used
2: In FITC:(repl:day) :
  numerical expression has 886 elements: only the first used

Can I do nesting without incurring this kind of error ?

And finally
model4=lmer(APC~gene*species+(1|day) + (1|repl) + 
(1+(gene:species)|FITC),data=small)
model5=lmer(APC~gene+species+(1|day) + (1|repl) + (1|FITC),data=small)
anova(model4,model5)

works with this reduced data set, but fails to converge for the 
original, much larger data set. I am unsure if this is the right kind of 
analysis for the data and there is only a problem of convergence, or if 
it is the wrong formula.

Can anybody give me some advice on this problem ? Or should I just stick 
to reducing the data from each condition to a single parameter, as 
explained in my first email below?

Thank you again!

Matthias Gralle wrote:
> Hello everybody,
>
> I have been trying for some weeks to state the correct design of my 
> experiment as a GLM formula, and have not been able to find something 
> appropriate in Pinheiro & Bates, so I am posting it here and hope 
> somebody can help me.
>
> In each experimental condition, described by
> 1) gene (10 levels, fixed, because of high interest to me)
> 2) species (2 levels, fixed, because of high interest)
> 3) day (2 levels, random)
> 4) replicate (2 levels per day, random),
>
> I have several thousand data points consisting of two variables:
>
> 5) FITC (level of transfection of a cell)
> 6) APC (antibody binding to the cell)
> Because of intrinsic and uncontrollable cell-to-cell variation, FITC 
> varies quite uniformly over a wide range, and APC correlates rather 
> well with FITC. In some cases, I pasted day and replicate together as 
> day_repl.
>
> My question is the following:
>
> Is there any gene (in my set of 10 genes) where the species makes a 
> difference in the relation between FITC and APC ? If yes, in what gene 
> does species have an effect ? And what is the effect of the species 
> difference ?
>
> My attempts are the following:
> 1. Fit the data points of each experimental condition to a linear 
> equation APC=Intercept+Slope*FITC and analyse the slopes :
> lm(Slope~species*gene*day_repl)
> This analysis shows clear differences between the genes, but no effect 
> of species and no interaction gene:species.
>
> The linear fit to the cells is reasonably good, but of course does not 
> represent the data set completely, so I wanted to incorporate the 
> complete data set.
>
> 2a. lmer(APC~FITC*species*gene+(1|day)+(1|repl))
> This gives extremely significant values for any interaction and 
> variable because there are >200 000 df. Of course, it cannot be true, 
> because the cells are not really independent. I have done many 
> variations of the above, e.g.
> 2b. lmer(APC~FITC*species*gene+(1|day)+(1+FITC|day_repl)),
> but they all suffer from the excess of df.
>
> 3. lmer(APC~species*gene+(1|day/repl/FITC) gives several warning 
> messages like this one:
> In repl:day :
> numerical expression has 275591 elements: only the first used
>
> 4. lmer(APC~gene*species+(1|day_repl) + (1+gene:species|FITC)) ran 
> several days, but failed to converge...
>
> Can somebody give me any hint, or do you think the only possible 
> analysis is a simplification as in my model 1 ?
>
> By the way, I am using R version 2.8.0 (2008-10-20) on Ubuntu 8.04 on 
> a linux 2.6.24-24-generic kernel on different Intel systems. I am 
> using the lme4 that came with R 2.8.0.
>
> Thank you very much for your time!
>
> -- Matthias Gralle, PhD
> Dept. Evolutionary Genetics
> Max Planck Institute for Evolutionary Anthropology
> Deutscher Platz 6
> 04103 Leipzig, Germany
> Tel +49 341 3550 519
> Fax +49 341 3550 555
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
>


-- 
Matthias Gralle, PhD
Dept. Evolutionary Genetics
Max Planck Institute for Evolutionary Anthropology
Deutscher Platz 6
04103 Leipzig, Germany
Tel +49 341 3550 519
Fax +49 341 3550 555



More information about the R-help mailing list