[R] Help with complex lme model fit

Fri May 25 17:15:40 CEST 2007

 Hi R helpers,

 I'm trying to fit a rather complex model to some simulated data using
 lme and am not getting the correct results. It seems there might be
some
 identifiability issues that could possibly be dealt with by
specifying
 starting parameters - but I can't see how to do this. I'm comparing
 results from R to those got when using GenStat...

 The raw data are available on the web at
http://cmbeale.freehostia.com/OutData.txt and can be read directly
into R using:

 gpdat <- read.table("http://cmbeale.freehostia.com/OutData.txt",
header = T)
 gpdat$X7 <- as.factor(gpdat$X7)
 gpdat$X4 <- as.factor(gpdat$X4)
 rand_mat <- as.matrix(gpdat[,11:26])
 gpdat <- groupedData(Y1 ~X1 + X2 + X3 + X4 + X5 + m_sum|.g, data =
gpdat)

 the model fitted using:

     library(Matrix)
     library(nlme)

     m_sum <- rowSums(gpdat[,11:27])
     mod1 <- lme(fixed = Y1 ~ X1 + X2 + X3 + X4 + X5 +  m_sum,
                   random = pdBlocked(list(pdIdent(~1), pdIdent (~ X6
-
 1),
                   pdIdent (~ X7 - 1), pdIdent(~ rand_mat -1))), data
=
 gpdat)

 Which should recover the variance components:

                 var_label            var_est
          rand_mat_scalar         0.00021983
         X6_scalar                       0.62314002
         X7_scalar        0.03853604

 as recovered by GenStat and used to generate the dataset. Instead I
 get:

 X6      0.6231819
 X7     0.05221481
 rand_mat    1.377596e-11

 However, If I change or drop either of X5 or X6. I then get much
closer
 estimates to what is expected. For example:

     mod2 <- lme(fixed = Y1 ~ X1 + X2 + X3 + X4 + X5 +  m_sum,
                   random = pdBlocked(list(pdIdent(~1), pdIdent (~ X6
-
 1),
                   pdIdent (~as.numeric( X7) - 1), pdIdent(~ rand_mat
 -1))), data = gpdat)

 returns variance components:
 X6      0.6137986
 X7     Not meaningful
 rand_mat    0.0006119088

 which is much closer to those used to generate the dataset for the
 parameters that are now meaningful, and has appropriate random effect
 estimates for the -rand_mat columns (the variable of most interest
 here). This suggests to me that there is some identifiability issue
that
 might be helped by giving different starting values. Is this
possible?
 Or does anyone have any other suggestions?

 Thanks,

 Colin

 sessionInfo:
R version 2.5.0 (2007-04-23) 
i386-pc-mingw32 

locale:
LC_COLLATE=English_United Kingdom.1252;
LC_CTYPE=English_United Kingdom.1252;
LC_MONETARY=English_United Kingdom.1252;
LC_NUMERIC=C;
LC_TIME=English_United Kingdom.1252

attached base packages:
[1] "stats"     "graphics"  "grDevices" "datasets"  "tcltk"     "utils"
    "methods"   "base"     

other attached packages:
       nlme      Matrix     lattice    svSocket        svIO      R2HTML
     svMisc       svIDE 
   "3.1-80" "0.9975-11"    "0.15-5"     "0.9-5"     "0.9-5"      "1.58"
    "0.9-5"     "0.9-5" 

Dr. Colin Beale
Spatial Ecologist
The Macaulay Institute
Craigiebuckler
Aberdeen
AB15 8QH
UK

Tel: 01224 498245 ext. 2427
Fax: 01224 311556
Email: c.beale at macaulay.ac.uk 

-- 
Please note that the views expressed in this e-mail are those of the
sender and do not necessarily represent the views of the Macaulay
Institute. This email and any attachments are confidential and are
intended solely for the use of the recipient(s) to whom they are
addressed. If you are not the intended recipient, you should not read,
copy, disclose or rely on any information contained in this e-mail, and
we would ask you to contact the sender immediately and delete the email
from your system. Thank you.
Macaulay Institute and Associated Companies, Macaulay Drive,
Craigiebuckler, Aberdeen, AB15 8QH.