[Rd] Model frame when LHS is cbind (PR#14189)

Arni Magnusson arnima at hafro.is
Thu Jan 21 21:30:15 CET 2010


Thank you Prof. Ripley, for examining this issue. I have two more 
questions on this topic, if I may.

(1) Truncated column names

With your explanations I can see that the problem of missing column names 
originates in cbind() and the 'deparse.level' bug we have just discovered. 
I had tried different 'deparse.level' values, only to see that it didn't 
solve my problem of missing column names.

   attach(mtcars)
   cbind(qsec,log(hp),sqrt(disp), deparse.level=0)  # no column names
   cbind(qsec,log(hp),sqrt(disp), deparse.level=1)  # qsec only
   cbind(qsec,log(hp),sqrt(disp), deparse.level=2)  # no column names
   cbind(qsec=qsec,log(hp),sqrt(disp), deparse.level=2)  # works!
   cbind(qsec=qsec,log(hp),sqrt(abs(disp)), deparse.level=2)  # hmm...

Now a new question arises. The last line generates these truncated column 
names

   "qsec"  "log(hp)"  "sqrt(abs(d..."

where the dots are not mine, but something that R decided to do, 
presumably to keep the column names no longer than 13 characters. I would 
prefer to retain the full column names, like this,

   as.matrix(data.frame(qsec,log(hp),sqrt(abs(disp)), check.names=FALSE))

where the column names are

   "qsec"  "log(hp)"  "sqrt(abs(disp))"

Is there some reason why cbind() should truncate column names? Matrices 
have no problems with very long column names.


(2) Changing the default 'deparse.level' to 2

Furthermore, since many users appreciate the compact model formula syntax 
in R, it would be great if the formula

   cbind(qsec, log(hp), sqrt(disp)) ~ wt

would result in a model frame with full column names, without sacrificing 
legibility by adding deparse.level=2 in between the variable names. The 
simplest way to achieve this would be by changing the default value of 
'deparse.level' to 2 in cbind() and probably rbind().

Am I missing some important cases where functions/users rely on some of 
the column names to be missing, as generated by deparse.level=1? And if 
so, do these cases outweigh the benefits of clean and compact formula 
syntax when modelling?


Many thanks,

Arni



On Thu, 21 Jan 2010, Prof Brian Ripley wrote:

> A few points.
>
> 0) This seems a Wishlist item, but it does not say so (see the section 
> on BUGS in the FAQ).
>
> 1) A formula does not need to have an lhs, and it is an assumption that 
> the response is the first element of 'variables' (an assumption not made 
> a couple of lines later when 'resp' is used).
>
> 2) I don't think this is the best way to get names.  If I do
>
> fm <- lm(cbind(a=qsec,b=log(hp),sqrt(disp))~wt, data=mtcars)
>
> I want a and b as names, but that is not what your code gives. And if I 
> do
>
>> X <- with(mtcars, cbind(a = qsec, b = log(hp), c=sqrt(disp)))
>> fm <- lm(X ~ wt, data=mtcars)
>> model.frame(fm)[[1]]
>       [,1]     [,2]      [,3]
>
> You've lost the names that the current code gives.
>
> The logic is that if you use a lhs which is a matrix with column names, 
> then those names are used.  If (as you did), you use one with empty 
> column names, that is what you get in the model frame.  This seems much 
> more in the spirit of R than second-guessing that the author actually 
> meant to give column names and create them, let alone renaming the 
> columns to be different than the names supplied.
>
> 3) It looks to me as if you wanted
>
> cbind(qsec, log(hp), sqrt(disp), deparse.level=2)
>
> but that does not give names (despite the description).  And that is I 
> think a bug that can easily be changed.  That way we can fulfil yoour 
> wish without breaking other people's code.
>
> -- 
> Brian D. Ripley,                  ripley at stats.ox.ac.uk
> Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
> University of Oxford,             Tel:  +44 1865 272861 (self)
> 1 South Parks Road,                     +44 1865 272866 (PA)
> Oxford OX1 3TG, UK                Fax:  +44 1865 272595
>



More information about the R-devel mailing list