[R] Strange data frame

Liaw, Andy andy_liaw at merck.com
Fri Apr 22 02:59:40 CEST 2005



> -----Original Message-----
> From: r-help-bounces at stat.math.ethz.ch 
> [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of McGehee, Robert
> Sent: Thursday, April 21, 2005 7:03 PM
> To: r-help at stat.math.ethz.ch
> Subject: [R] Strange data frame
> 
> 
> Hello, 
> I'm playing around with the PLS package and found a data set 
> (NIR) whose
> structure I don't understand. Forgive me if this is a stupid question,
> as I feel like it must be since I am less experienced with aspects of
> modeling. 
> 
> My problem, the pls NIR data frame does not seem to be a typical data
> frame as, while it is a list, its variables are not of equal length.
> Furthermore, I have no idea how to reproduce such a structure.
> 
> But, let's look at the NIR data...
> 
> > require(pls)
> > data(NIR)
> > class(NIR)
> [1] "data.frame"
> 
> > str(NIR)
> `data.frame':	28 obs. of  3 variables:
>  $ X    : num [1:28, 1:268] 3.07 3.07 3.08 3.08 3.10 ...
>   ..- attr(*, "dimnames")=List of 2
>   .. ..$ : NULL
>   .. ..$ : NULL
>  $ y    : num  100.0  80.2  79.5  60.8  60.0 ...
>  $ train: logi  TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
> TRUE ...
> 
> > class(NIR$X)
> [1] "matrix"
> > class(NIR$y)
> [1] "numeric"
> 
> > length(NIR$X)
> [1] 7504
> > length(NIR$y)
> [1] 28
> 
> Ok, what this looks like to me is that NIR is a data frame 
> (i.e. "a list
> of variables of the same length with unique row names"), with a matrix
> of length 7504 as one variable, and a numeric vector of length 28 as
> another variable, which seems to contradict the definition of a data
> frame.
> 
> Moreover, despite my best efforts, I'm unable to put any of 
> my own data
> in this structure, as the data.frame() and as.data.frame() functions
> removes the matrix structure i.e. 
> > data.frame(y = NIR$y, X = NIR$X) 			## or 
> > as.data.frame(list(y = NIR$y, X = NIR$X))
> return a different animal altogether.

Variables in a data frame can be a matrix whose number of rows matches that
of the data frame.  Here's one possible ways to do that:

> dat <- data.frame(y=1:2)
> dat$x <- matrix(runif(4),2)
> str(dat)
`data.frame':   2 obs. of  2 variables:
 $ y: int  1 2
 $ x: num [1:2, 1:2] 0.562 0.670 0.738 0.903

If the number of rows doesn't match, you get:

> dat$x <- matrix(runif(6),3)
Error in "$<-.data.frame"(`*tmp*`, "x", value = c(0.669958727201447,
0.111689866287634,  : 
        replacement has 3 rows, data has 2
 
 
> Lastly, this particular structure is useful, because the PLS 
> authors are
> able to concisely write models such as,
> 
> mvr(y ~ X, data = NIR[NIR$train, ])
> 
> instead of what I imagine would be a more complicated alternative if
> they didn't have a data frame of a matrix and a vector as they do. Any
> pointers to something I overlooked is appreciated.

Many modeling functions will accept matrix predictors, including
lm()/glm()/rpart()/etc.

Andy
 
> Best,
> Robert
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
> 
> 
>




More information about the R-help mailing list