[R] poly objects as data frame columns

Ulrike Grömping groemp at tfh-berlin.de
Fri Jul 17 21:24:21 CEST 2009


David, 

thanks. Your explanation does not quite fit, though, as it refers to using
function data.frame, while I assigned the new column with $<-. poly() does
return an object of classes poly and matrix, not model.matrix, and handing a
poly object to function data.frame does behave like I would expect it to: 

dat <- data.frame(X1=1:10, X2=LETTERS[1:10])
dat <- data.frame(dat, X1poly = poly(dat$X1,3))
dat         ## five columns displayed
ncol(dat)  ## returns 5
colnames(dat) ## returns a vector of 5 names

It is just the assignment with "$" that does behave differently - and not
only for poly objects but for any matrix object. After I eventually
remembered how to get to the documentation of extractors
(?"$<-.data.frame"), I found this behavior documented there in the section
on Coercion. Nevertheless, this does seem to contradict the understanding of
what a data frame is. I am aware that data frames are lists, but they are of
course special lists, requiring that all list elements have the same number
of rows. So far I thought that all list elements also have the same number
of columns, namely just one. In fact, the documentation of function
data.frame states that 

"A data frame is a list of variables of the same length with unique row
names, given class "data.frame".",

which would imply such a rule. The possibility of a matrix with more than
one column being a column of the data frame contradicts this piece of
documentation, since the length of the matrix is not the same as the length
of the other columns (e.g. length(poly(dat$X1,3) is 30, not 10 like for the
other variables). Or would one consider the columns of the matrix X1poly the
variables, but X1poly a column ? I'm not trying to be difficult, I just find
this quite confusing and wonder about the consequences when using such a
data frame in analyses.

Regards, Ulrike


David Winsemius wrote:
> 
> Dataframes are lists. Look at dat with str and you will see that the  
> third column (actually the third list element) is a matrix. It's not  
> hard to find the documentation. If you read the documentation on the  
> help page for data.frame you should see this:
> 
> "If a list or data frame or matrix is passed to data.frame it is as if  
> each component or column had been passed as a separate argument  
> (except for matrices of class"model.matrix" and those protected by I)."
> 
> It seems reasonable that poly() returns an object that is considered a  
> model.matrix.
> 
> On Jul 17, 2009, at 12:54 PM, Ulrike Grömping wrote:
> 
>>
>> Dear UseRs,
>>
>> I just learnt that the number of columns of a data frame is not  
>> always what
>> I thought it to be, and I wonder where I should have learnt about  
>> this.
>> Consider the following example:
>>
>> dat <- data.frame(X1=1:10, X2=LETTERS[1:10])
>> ncol(dat)          ## evaluates to 2 (of course)
>> dat$X1poly <- poly(dat$X1,3)
>> dat                  ## five columns displayed
>> ncol(dat)          ## evaluates to 3
>> colnames(dat)   ## three names (third is X1poly)
>> colnames(dat)[3] <- "newname"
>> dat                 ## all three previous X1poly columns renamed
>>
>> This appears intentional, as it treats the column names reasonably.  
>> Where is
>> it documented ? Are there any other scenarios for which the number of
>> columns displayed when printing a data frame does not coincide with  
>> ncol ?
>>
>> Regards, Ulrike
> 
> David Winsemius, MD
> Heritage Laboratories
> West Hartford, CT
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> 

-- 
View this message in context: http://www.nabble.com/poly-objects-as-data-frame-columns-tp24538067p24540280.html
Sent from the R help mailing list archive at Nabble.com.




More information about the R-help mailing list