[R] Can a data.frame column contain lists/arrays?

hadley wickham h.wickham at gmail.com
Wed Feb 14 06:07:19 CET 2007


> I'd like to have a data.frame structured something like the following:
>
> d <- data.frame (
>    x=list( c(1,2), c(5,2), c(9,1) ),
>    y=c( 1, -1, -1)
> )
>
> The reason is this: 'd' is the training data for a machine learning
> algorithm.  d$x is the independent data, and d$y is the dependent
> data.
>
> In general my machine learning code will work where each element of
> d$x is a vector of one or more real numbers.  So for instance, the
> same code should work when d$x[1] = 42, or when d$x[1] = (42, 3, 5).
> All that matters is that all element within d$x are lists/vectors of
> the same length.
>
> Does anyone know if/how I can get a data.frame set up like that?

You certainly can, although it requires a little work. A data.frame is
a list of vectors, each of the same length, and a list is a type of
vector.   I use this structure fairly often in my own work, and find
it quite useful.

However, the data.frame and as.data.frame functions try to be helpful
at converting lists to regular columns so you must first create your
data.frame and then add the column which is a list:

> df <- data.frame(a=1:2)
> df$b <- list(1:5, 6:10)
> df
  a              b
1 1  1, 2, 3, 4, 5
2 2 6, 7, 8, 9, 10

> str(df)
'data.frame':   2 obs. of  2 variables:
 $ a: int  1 2
 $ b:List of 2
  ..$ : int  1 2 3 4 5
  ..$ : int  6 7 8 9 10

but

> data.frame(a=1:2, b = list(1:5, 6:10))
Error in data.frame(a = 1:2, b = list(1:5, 6:10)) :
        arguments imply differing number of rows: 2, 5

Note that it is possible to create structures like this which do not
print, but still contain valid objects:

> df$b <- list(lm(mpg~wt, data=mtcars), lm(mpg~vs, data=mtcars))
> df
Error in unlist(x, recursive, use.names) :
        argument not a list

> summary(df[1,2])
     Length Class Mode
[1,] 12     lm    list
> summary(df[1,2][[1]])

Call:
lm(formula = mpg ~ wt, data = mtcars)

Residuals:
    Min      1Q  Median      3Q     Max
-4.5432 -2.3647 -0.1252  1.4096  6.8727

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)  37.2851     1.8776  19.858  < 2e-16 ***
wt           -5.3445     0.5591  -9.559 1.29e-10 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 3.046 on 30 degrees of freedom
Multiple R-Squared: 0.7528,     Adjusted R-squared: 0.7446
F-statistic: 91.38 on 1 and 30 DF,  p-value: 1.294e-10



There are some functions in the reshape package, in particular stamp,
which make this a bit easier for particular types of data.

Regards,

Hadley



More information about the R-help mailing list