[R] Observed responses in 'augPred' data frame - Wrong order ?

Marco Barbàra jabbadhutt at libero.it
Wed Oct 8 19:30:07 CEST 2008


Dea-R community.

I'd like to draw your attention to an issue I have recently
encountered while doing my current data analysis.

I've got an unexpected (to me) result from the command:

> augPred(lmList(my.object)),

'my.object' being a grouped data frame of class:

> class(my.object)
[1] "nfnGroupedData" "nfGroupedData"  "groupedData"    "data.frame"

The problem is more easily explained by showing these two to Trellis
plot:

> plot(my.object,some_options)
> plot(augPred(lmList(my.object)),some_options)

[link to output: http://www.palug.net/Members/jabba ]

Clearly the problem is that the predictions are correct, but the
responses in the ``original'' part of the augPred data frame are in the
wrong order, i.e. the response order does not match the grouping
factor order (not sure this is in correct English, sorry).

Debugging the function 'augPred.lmList' I've managed to understand that
the problem resides in the following function calls:

> debug(nlme:::augPred.lmList)
> fm1.lis <- lmList(my.object)
> augPred(fm1.lis)
debugging in: augPred.lmList(fm1.lis)
debug: {
    .....
    primary <- getCovariate(data)
    .....
    groups <- getGroups(object)
    .....
    orig <- data.frame(primary,groups, getResponse(object))
    .....
}

This approach works well on other datasets, such us the Orthodont data,
but not for mine because, i guess, my rows are not first ordered by
subject (subject is my grouping factor). In fact:

> getGroups(fm1.lis)[1:5]
[1] CBT.1 CBT.2 CBT.3 CBT.4 CBT.5
34 Levels: CBT.14 < CBT.2 < CBT.11 < CBT.13 < CBT.12 < CBT.5 < ... < V.6
> getResponse(fm1.lis)[1:5]
CBT.1 CBT.1 CBT.1 CBT.1 CBT.2 
   34    32    29    27    28 

My question: 
is there a simple way to get the proper result (modify the dataset or
modify the function)? RTFM responses are welcomed.

My (humble?) opinion:
Neither Pinheiro-Bates (2000) nor online documentation, report the
need for a particular row order in the costruction of grouped data
objects, and i think it should be better do not rely on it. Or, at
least, to document it somewhere.


I want to thank the R-core team and the entire R community for this
amazingly efficient and versatile software, and i am very proud that the
(not only) statistical state-of-the-art software have a ``gpl
distribution'' (which have many desirable properties ;-) ).

Cheers.

+---------------------------+
|Relevant system information|
+---------------------------+

> R.version
               _                           
platform       i486-pc-linux-gnu           
arch           i486                        
os             linux-gnu                   
system         i486, linux-gnu             
status                                     
major          2                           
minor          7.2                         
year           2008                        
month          08                          
day            25                          
svn rev        46428                       
language       R                           
version.string R version 2.7.2 (2008-08-25)

> sessionInfo()
R version 2.7.2 (2008-08-25) 
i486-pc-linux-gnu 

locale:
LC_CTYPE=it_IT;LC_NUMERIC=C;LC_TIME=it_IT;LC_COLLATE=it_IT;LC_MONETARY=C;LC_MESSAGES=it_IT;LC_PAPER=it_IT;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=it_IT;LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods
base     

other attached packages:
[1] MASS_7.2-44     lattice_0.17-15 nlme_3.1-89    

loaded via a namespace (and not attached):
[1] grid_2.7.2  tools_2.7.2


-- 
  Marco Barbàra

- Undergraduate student in Statistics 
  at the University of Palermo (Italy)

- Only free-as-in-freedom software user

- Emacs lover



More information about the R-help mailing list