[R] Using lmer with huge amount of data

Prof Brian Ripley ripley at stats.ox.ac.uk
Tue Jul 24 21:18:47 CEST 2007


I think I am missing something here: how do you make this 'huge' and 
'gigantic'?  You have not told us how many subjects you have, but in 
imaging experiments it is usually no more than 50 and often less.

For each subject you have 3 x 30,000 responses plus an age.  That is under 
1Mb of data per subject, so the problem looks modest unless you have many 
hundreds of subjects.

Nothing says you need to read the data in one go, but it will be helpful 
to have all the data available to R at once (although this could be 
alleviated by using a DBMS interface).

I think the problem is rather going to be running 30,000 lmer fits, which 
in my experience often take seconds each.  Each fit will only need a 
modest amount of data (3 responses and one age per subject).

On Tue, 24 Jul 2007, Gang Chen wrote:

> Based on the examples I've seen in using statistical analysis
> packages such as lmer, it seems that people usually tabulate all the
> input data into one file with the first line indicating the variable
> names (or labels), and then read the file inside R. However, in my
> case I can't do that because of the huge amount of imaging data.
>
> Suppose I have a one-way within-subject ANCOVA with one covariate,
> and I would like to use lmer in R package lme4 to analyze the data.
> In the terminology of linear mixed models, I have a fixed factor A
> with 3 levels, a random factor B (subject), and a covariate (age)
> with a model like this
>
> MyResult <- lmer(Response ~ FactorA + Age + (1 | subject), MyData, ...)
>
> My input data are like this: For each subject I have a file (a huge
> matrix) storing the response values of the subject at many locations
> (~30,000 voxels) corresponding to factor A at the 1st level, another
> file for factor A at the 2nd level, and a 3rd file for factor A at
> the 3rd level. Then I have another file storing the age of those
> subjects. The analysis with the linear mixed model above would be
> done at each voxel separately.
>
> It seems impractical to create one gigantic file or matrix to feed
> into the above command line because of the big number of voxels. I'm
> not sure how to proceed in this case. Any suggestions would be highly
> appreciated.
>
> Also if I'm concerned about any potential violation of sphericity
> among the 3 levels of factor A, how can I test sphericity violation
> in lmer? And if violation exists, how can I make corrections in
> contrast testing?
>
> Thank you very much,
> Gang

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595



More information about the R-help mailing list