[R] Example data for analysis of a highly pseudoreplicate mixed-effects experiment

Matthias Gralle matthias_gralle at eva.mpg.de
Wed Sep 16 17:41:46 CEST 2009


There were some wrong NA values in the provided data set, this is now 
corrected.
The data can be read in as

small=read.csv("small.csv",colClasses=c("character",rep("integer",2),rep("factor",5)))

The high number of residual df can be seen using the nlme package (can 
it be seen in the lme4 package, too ?):
library(nlme)
model1=lme(APC~(FITC+species+gene)^3,random=~1|day_repl,method="ML",data=small) 

anova(model1)
                  numDF denDF   F-value p-value
(Intercept)           1   789 1365.7400  <.0001
FITC                  1   789 3040.8168  <.0001
species               1   789   24.0521  <.0001
gene                  1   789   10.4035  0.0013
FITC:species          1   789    5.0690  0.0246
FITC:gene             1   789   10.7925  0.0011
species:gene          1   789   72.5823  <.0001
FITC:species:gene     1   789    4.6742  0.0309

In the original data set, denDF is >200 000.

Once again, how do I correctly account for pseudoreplication, avoiding 
the artificially high number of df ?

Thank you,

Matthias

Matthias Gralle wrote:
> Hello everybody,
>
> it may be better to have sample data. I have provided data with less 
> levels of "gene" and "day" and only ca. 400 data points per condition.
>
> Sample code:
> small=as.data.frame(read.csv("small.csv"))
> small$species=factor(small$species)
> small$gene=factor(small$gene)
> small$day=factor(small$day)
> small$repl=factor(small$repl)
> library(lme4)
> model1=lmer(APC~(FITC+species+gene)^3+(1|day)+(1|repl),REML=F,data=small)
> model2=lmer(APC~(FITC+species+gene)^2+(1|day)+(1|repl),REML=F,data=small)
> anova(model1,model2)
>
> gives me a significant difference, but in fact there are far too many 
> residual df, and this is much worse in the original data. I suppose I 
> cannot trust this difference.
>
> model3=lmer(APC~species*gene+(1|day/repl/FITC),data=small)
> Error: length(f1) == length(f2) is not TRUE
> In addition: Warning messages:
> 1: In FITC:(repl:day) :
>  numerical expression has 886 elements: only the first used
> 2: In FITC:(repl:day) :
>  numerical expression has 886 elements: only the first used
>
> Can I do nesting without incurring this kind of error ?
>
> And finally
> model4=lmer(APC~gene*species+(1|day) + (1|repl) + 
> (1+(gene:species)|FITC),data=small)
> model5=lmer(APC~gene+species+(1|day) + (1|repl) + (1|FITC),data=small)
> anova(model4,model5)
>
> works with this reduced data set, but fails to converge for the 
> original, much larger data set. I am unsure if this is the right kind 
> of analysis for the data and there is only a problem of convergence, 
> or if it is the wrong formula.
>
> Can anybody give me some advice on this problem ? Or should I just 
> stick to reducing the data from each condition to a single parameter, 
> as explained in my first email below?
>
> Thank you again!
>
> Matthias Gralle wrote:
>> Hello everybody,
>>
>> I have been trying for some weeks to state the correct design of my 
>> experiment as a GLM formula, and have not been able to find something 
>> appropriate in Pinheiro & Bates, so I am posting it here and hope 
>> somebody can help me.
>>
>> In each experimental condition, described by
>> 1) gene (10 levels, fixed, because of high interest to me)
>> 2) species (2 levels, fixed, because of high interest)
>> 3) day (2 levels, random)
>> 4) replicate (2 levels per day, random),
>>
>> I have several thousand data points consisting of two variables:
>>
>> 5) FITC (level of transfection of a cell)
>> 6) APC (antibody binding to the cell)
>> Because of intrinsic and uncontrollable cell-to-cell variation, FITC 
>> varies quite uniformly over a wide range, and APC correlates rather 
>> well with FITC. In some cases, I pasted day and replicate together as 
>> day_repl.
>>
>> My question is the following:
>>
>> Is there any gene (in my set of 10 genes) where the species makes a 
>> difference in the relation between FITC and APC ? If yes, in what 
>> gene does species have an effect ? And what is the effect of the 
>> species difference ?
>>
>> My attempts are the following:
>> 1. Fit the data points of each experimental condition to a linear 
>> equation APC=Intercept+Slope*FITC and analyse the slopes :
>> lm(Slope~species*gene*day_repl)
>> This analysis shows clear differences between the genes, but no 
>> effect of species and no interaction gene:species.
>>
>> The linear fit to the cells is reasonably good, but of course does 
>> not represent the data set completely, so I wanted to incorporate the 
>> complete data set.
>>
>> 2a. lmer(APC~FITC*species*gene+(1|day)+(1|repl))
>> This gives extremely significant values for any interaction and 
>> variable because there are >200 000 df. Of course, it cannot be true, 
>> because the cells are not really independent. I have done many 
>> variations of the above, e.g.
>> 2b. lmer(APC~FITC*species*gene+(1|day)+(1+FITC|day_repl)),
>> but they all suffer from the excess of df.
>>
>> 3. lmer(APC~species*gene+(1|day/repl/FITC) gives several warning 
>> messages like this one:
>> In repl:day :
>> numerical expression has 275591 elements: only the first used
>>
>> 4. lmer(APC~gene*species+(1|day_repl) + (1+gene:species|FITC)) ran 
>> several days, but failed to converge...
>>
>> Can somebody give me any hint, or do you think the only possible 
>> analysis is a simplification as in my model 1 ?
>>
>> By the way, I am using R version 2.8.0 (2008-10-20) on Ubuntu 8.04 on 
>> a linux 2.6.24-24-generic kernel on different Intel systems. I am 
>> using the lme4 that came with R 2.8.0.
>>
>> Thank you very much for your time!
>>
>> -- Matthias Gralle, PhD
>> Dept. Evolutionary Genetics
>> Max Planck Institute for Evolutionary Anthropology
>> Deutscher Platz 6
>> 04103 Leipzig, Germany
>> Tel +49 341 3550 519
>> Fax +49 341 3550 555
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide 
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>>
>
>
> ------------------------------------------------------------------------
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>   


-- 
Matthias Gralle, PhD
Dept. Evolutionary Genetics
Max Planck Institute for Evolutionary Anthropology
Deutscher Platz 6
04103 Leipzig, Germany
Tel +49 341 3550 519
Fax +49 341 3550 555



More information about the R-help mailing list