[R] Differing Variable Length Inconsistencies in Random Effects/Regression Models

Wed Jul 15 19:02:35 CEST 2009

Dear All,

I am quite new to R and am having a problem trying to run a linear model 
with random effects/ a regression- with particular regard to my variable 
lengths being different and the models refusing to compute any further.

The codes I have been using are as follows:

vc<-read.table("P:\\R\\Testvcomp10.txt",header=T)
>> attach(vc)
>
> family<-factor(family)
> colms<-(vc)[,4:13] ## this to assign the 10 columns containing marker
> data    to a new variable, as column names are themselves not in any
> recognisable sequence
>
> vcdf<-data.frame(family,peg.no,ec.length,syll.length,colms)
> library(lme4)

>> for (c in levels(family))
> + {    for (i in 1:length(colms))
> +        { fit<-lmer(peg.no~1 + (1|c/i), vcdf)
> +        }
> +    summ<-summary(fit)
> +    av<-anova(fit)
> +    print(summ)
> +    print(av)
> + }
>
> This gives me:
>
> Error in model.frame.default(data = vcdf, formula = peg.no ~ 1 + (1 +  :
>  variable lengths differ (found for 'c')

I had posted a similar message on the R mixed model list a few days ago, 
with respect to my fundamental methods, and Jerome Goudet had kindly 
referred me to an alternative approach using residuals obtained from a 
random effects model in lmer(), and then doing regressions using those 
[residuals being the dependent variable and my marker data columns the 
independent variable].

The code for that is as follows:

 vc<-read.table("P:\\R\\Text 
Files\\Testvcomp10.txt",header=T,sep="",dec=".",na.strings=NA,strip.white=T)
> attach(vc)
>
> family<-factor(family)
> colms<-(vc)[,4:13]
>
> names(vc)
 [1] "male.parent"  "family"       "offspring.id" "P1L55"        "P1L73" 

 [6] "P1L74"        "P1L77"        "P1L91"        "P1L96"        "P1L98" 

[11] "P1L100"       "P1L114"       "P1L118"       "peg.no" 
"ec.length"
[16] "syll.length"
>
> vcdf<-data.frame(family, colms, peg.no, ec.length, syll.length)
>
> library(lme4)

> famfit<-lmer(peg.no~1 + (1|family), na.action=na.omit, vcdf)
> resfam<-residuals(famfit)
> for( i in 1:length(colms))
+ {
+ print ("Marker", i)
+ regfam<-abline(lm(resfam~i))
+ print(regfam)
+ }

This again gives me the error:

[1] "Marker"
Error in model.frame.default(formula = resfam ~ i, drop.unused.levels = 
TRUE) :
  variable lengths differ (found for 'i')

My variables all have missing values somewhere or the other. The missing 
values are not consistent for all individuals, i.e some individuals have 
some values missing, others have others,
 and as much as I have tried to use na.action=na.omit and na.rm=T, the 
differing variable length problem is dogging me persistently..

I also tried to isolate the residuals, save them in a new variable (called 
'resfam' here), and tried to save that in the data.frame()->vcdf, that I 
had created earlier.

The problem with that was that when the residuals were computed, lmer() 
ignored missing data in 'peg.no' with respect to 'family', which is 
obviously not the same data missing for say another variable E.g. 
'ec.length'- leading again to an inconsistency in variable lengths. 
Data.frame would then not accept that addition at all to the previous set.
This was fairly obvious right from the start, but I decided to try it 
anyway. Didn't work.

I apologise if the solution to working with these different variable 
lengths is obvious and I don't know it- but I don't know R that well at all.

My data files can be downloaded at the following location:

<http://www.filesanywhere.com/fs/v.aspx?v=896d6b88616173be71ab> (excel-
.xlsx)

<http://www.filesanywhere.com/fs/v.aspx?v=896d6b88616174a76e9e>
(.txt file)

Any pointers would be greatly appreciated, as this is holding me up loads.

Thanks a ton for your help,

Aditi

----------------------
A Singh
Aditi.Singh at bristol.ac.uk
School of Biological Sciences
University of Bristol