[R] poly objects as data frame columns

David Winsemius dwinsemius at comcast.net
Fri Jul 17 22:45:55 CEST 2009


On Jul 17, 2009, at 3:24 PM, Ulrike Grömping wrote:

>
> David,
>
> thanks. Your explanation does not quite fit, though, as it refers to  
> using
> function data.frame, while I assigned the new column with $<-.  
> poly() does
> return an object of classes poly and matrix, not model.matrix,

But model.matrix is not a class as far as I can tell. It has no  
"is.<>" function, and examining a sample model matrix does not  
indicate that it carries a special class attribute.

> and handing a
> poly object to function data.frame does behave like I would expect  
> it to:
>
> dat <- data.frame(X1=1:10, X2=LETTERS[1:10])
> dat <- data.frame(dat, X1poly = poly(dat$X1,3))
> dat         ## five columns displayed
> ncol(dat)  ## returns 5
> colnames(dat) ## returns a vector of 5 names
>
> It is just the assignment with "$" that does behave differently -  
> and not
> only for poly objects but for any matrix object. After I eventually
> remembered how to get to the documentation of extractors
> (?"$<-.data.frame"), I found this behavior documented there in the  
> section
> on Coercion. Nevertheless, this does seem to contradict the  
> understanding of
> what a data frame is. I am aware that data frames are lists, but  
> they are of
> course special lists, requiring that all list elements have the same  
> number
> of rows. So far I thought that all list elements also have the same  
> number
> of columns, namely just one. In fact, the documentation of function
> data.frame states that
>
> "A data frame is a list of variables of the same length with unique  
> row
> names, given class "data.frame".",
>
> which would imply such a rule.

Except that the same page asserts:

"Note that when the replacement value is an array (including a matrix)  
it is not treated as a series of columns (as data.frame and  
as.data.frame do) but inserted as a single column."

... which is more on point documentation than what I offered earlier.  
I also found that the <-I() construct within the data.frame()  would  
replicate the behavior of df$x<-<mtx> (as was documented in  
data.frame's help:
 > dat2 <- data.frame(X1=1:10, X2=LETTERS[1:10], X1poly <- I(poly(dat 
$X1,3)) )
 > length(dat2)
[1] 3
 > dat2[1,3]
               1        2          3
[1,] -0.4954337 0.522233 -0.4534252
attr(,"class")
[1] "poly"   "matrix"

> The possibility of a matrix with more than
> one column being a column of the data frame contradicts this piece of
> documentation, since the length of the matrix is not the same as the  
> length
> of the other columns (e.g. length(poly(dat$X1,3) is 30, not 10 like  
> for the
> other variables). Or would one consider the columns of the matrix  
> X1poly the
> variables, but X1poly a column ? I'm not trying to be difficult, I  
> just find
> this quite confusing and wonder about the consequences when using  
> such a
> data frame in analyses.

The could be unforeseen consequences, but I am not the right person to  
answer for all of those possibilities. I can see another instance  
where it would be desirable to have tuples included in data.frames as  
arrays and that is in the representation of complex numbers, but it  
appears that the internal representation of complex numbers is more  
completely hidden from casual view than is the capacity of data.frames  
to carry matrices. If you have a compelling argument to change the  
behavior of [<-.data.frame, you will need to take it up with the  
developers.

Best Regards;
David.

>
> Regards, Ulrike
>
>
> David Winsemius wrote:
>>
>> Dataframes are lists. Look at dat with str and you will see that the
>> third column (actually the third list element) is a matrix. It's not
>> hard to find the documentation. If you read the documentation on the
>> help page for data.frame you should see this:
>>
>> "If a list or data frame or matrix is passed to data.frame it is as  
>> if
>> each component or column had been passed as a separate argument
>> (except for matrices of class"model.matrix" and those protected by  
>> I)."
>>
>> It seems reasonable that poly() returns an object that is  
>> considered a
>> model.matrix.
>>
>> On Jul 17, 2009, at 12:54 PM, Ulrike Grömping wrote:
>>
>>>
>>> Dear UseRs,
>>>
>>> I just learnt that the number of columns of a data frame is not
>>> always what
>>> I thought it to be, and I wonder where I should have learnt about
>>> this.
>>> Consider the following example:
>>>
>>> dat <- data.frame(X1=1:10, X2=LETTERS[1:10])
>>> ncol(dat)          ## evaluates to 2 (of course)
>>> dat$X1poly <- poly(dat$X1,3)
>>> dat                  ## five columns displayed
>>> ncol(dat)          ## evaluates to 3
>>> colnames(dat)   ## three names (third is X1poly)
>>> colnames(dat)[3] <- "newname"
>>> dat                 ## all three previous X1poly columns renamed
>>>
>>> This appears intentional, as it treats the column names reasonably.
>>> Where is
>>> it documented ? Are there any other scenarios for which the number  
>>> of
>>> columns displayed when printing a data frame does not coincide with
>>> ncol ?
>>>
>>> Regards, Ulrike
>>

David Winsemius, MD
Heritage Laboratories
West Hartford, CT




More information about the R-help mailing list