[R] poly objects as data frame columns

David Winsemius dwinsemius at comcast.net
Sat Jul 18 00:33:51 CEST 2009


On Jul 17, 2009, at 5:25 PM, Ulrike Grömping wrote:
>
> David Winsemius wrote:
>>
>>
>> On Jul 17, 2009, at 3:24 PM, Ulrike Grömping wrote:
>>
>>>
>>> David,
>>>
>>> thanks. Your explanation does not quite fit, though, as it refers to
>>> using
>>> function data.frame, while I assigned the new column with $<-.
>>> poly() does
>>> return an object of classes poly and matrix, not model.matrix,
>>
>> But model.matrix is not a class as far as I can tell. It has no
>> "is.<>" function, and examining a sample model matrix does not
>> indicate that it carries a special class attribute.
>>
> It is a class all right, but is apparently not per default assigned to
> objects generated with function model.matrix. Try
> mm <- model.matrix(lm(swiss))
> str(data.frame(swiss,mm))
> class(mm) <- c("model.matrix","matrix")
> str(data.frame(swiss,mm))
>

Not what I expected. Seems odd to need to assert a class in order to  
get that effect.


> David Winsemius wrote:
>>
>>> ...
>>> It is just the assignment with "$" that does behave differently -
>>> and not
>>> only for poly objects but for any matrix object. After I eventually
>>> remembered how to get to the documentation of extractors
>>> (?"$<-.data.frame"), I found this behavior documented there in the
>>> section
>>> on Coercion. Nevertheless, this does seem to contradict the
>>> understanding of
>>> what a data frame is. I am aware that data frames are lists, but
>>> they are of
>>> course special lists, requiring that all list elements have the same
>>> number
>>> of rows. So far I thought that all list elements also have the same
>>> number
>>> of columns, namely just one. In fact, the documentation of function
>>> data.frame states that
>>>
>>> "A data frame is a list of variables of the same length with unique
>>> row
>>> names, given class "data.frame".",
>>>
>>> which would imply such a rule.
>>
>> Except that the same page asserts:
>>
>> "Note that when the replacement value is an array (including a  
>> matrix)
>> it is not treated as a series of columns (as data.frame and
>> as.data.frame do) but inserted as a single column."
>>
> This is the piece on coercion in the extract documentation I was also
> referring to.
>
>
> David Winsemius wrote:
>>
>> ... which is more on point documentation than what I offered earlier.
>> I also found that the <-I() construct within the data.frame()  would
>> replicate the behavior of df$x<-<mtx> (as was documented in
>> data.frame's help:
>>> dat2 <- data.frame(X1=1:10, X2=LETTERS[1:10], X1poly <- I(poly(dat
>> $X1,3)) )
>>> length(dat2)
>> [1] 3
>>> dat2[1,3]
>>               1        2          3
>> [1,] -0.4954337 0.522233 -0.4534252
>> attr(,"class")
>> [1] "poly"   "matrix"
>>> The possibility of a matrix with more than
>>> one column being a column of the data frame contradicts this piece  
>>> of
>>> documentation, since the length of the matrix is not the same as the
>>> length
>>> of the other columns (e.g. length(poly(dat$X1,3) is 30, not 10 like
>>> for the
>>> other variables). Or would one consider the columns of the matrix
>>> X1poly the
>>> variables, but X1poly a column ? I'm not trying to be difficult, I
>>> just find
>>> this quite confusing and wonder about the consequences when using
>>> such a
>>> data frame in analyses.
>>
>> The could be unforeseen consequences, but I am not the right person  
>> to
>> answer for all of those possibilities. I can see another instance
>> where it would be desirable to have tuples included in data.frames as
>> arrays and that is in the representation of complex numbers, but it
>> appears that the internal representation of complex numbers is more
>> completely hidden from casual view than is the capacity of  
>> data.frames
>> to carry matrices. If you have a compelling argument to change the
>> behavior of [<-.data.frame, you will need to take it up with the
>> developers.
>>
> I have no idea which behavior is more useful; also, if this behavior  
> has
> been
> around for a long time, changing it would presumably break some  
> code. I
> suppose I would just opt for clearer documentation of the data frame  
> class.
> The bugs interface is currently down, I may file a documentation  
> wish or
> documentation bug later.
>
> Best regards, Ulrike
>
>
> -- 

David Winsemius, MD
Heritage Laboratories
West Hartford, CT




More information about the R-help mailing list