[R] Data.frame Vs Matrix Vs Array: Definitions Please

Gabor Grothendieck ggrothendieck at gmail.com
Wed Oct 27 02:49:09 CEST 2010


On Tue, Oct 26, 2010 at 8:37 PM, Matt Curcio <matt.curcio.ri at gmail.com> wrote:
> Hi All,
> I am learning R and having a little trouble with the usage and proper
> definitions of data.frames vs. matrix vs vectors. I have read many R
> tutorials, and looked over ump-teen 'cheat' sheets and have found that
> no one has articulated a really good definition of the differences
> between 'data.frames', 'matrix', and 'arrays' and even 'factors'.  I
> realize that I might have missed someones R tutorial, and actually
> would like to receive 'your' most concise or most useful tutorial.
> Any help would be appreciated.
>
> My particular favorite explanation and helpful hint is from the
> 'R-Inferno'.  Don't get me wrong...  I think this pdf is great and
> some tables are excellent. Overall it is a very good primer but this
> one section leaves me puzzled.  This quote belies the lack of hard and
> fast rules for what and when to use 'data.frames', 'matrix', and
> 'arrays'.  It discusses ways in which to simplify your work.
>
> Here are a few possibilities for simplifying:
> • Don’t use a list when an atomic vector will do.
> • Don’t use a data frame when a matrix will do.
> • Don’t try to use an atomic vector when a list is needed.
> • Don’t try to use a matrix when a data frame is needed.
>
> Cheers,
> Matt C

Look at their internal representations and it will become clearer.  v,
a vector, has length 6.  m, a matrix, is actually the same as the
vector v except is has dimensions too. Since m is just a vector with
dimensions, m has length 6 as well.  L is a list and has length 2
because its a vector each of whose components is itself a vector.  DF
is a data frame and is the same as L except its 2 components must each
have the same length and it must have row and column names.  If you
don't assign the row and column names they are automatically generated
as we can see.  Note that row.names = c(NA, -3L) is a short form for
row names of 1:3 and .Names internally refers to column names.

> v <- 1:6 # vector
> dput(v)
1:6
>
> m <- v; dim(m) <- 2:3 # m is a matrix since we added dimensions
> dput(m)
structure(1:6, .Dim = 2:3)
>
> L <- list(1:3, 4:6)
> dput(L)
list(1:3, 4:6)
>
> DF <- data.frame(1:3, 4:6)
> dput(DF)
structure(list(X1.3 = 1:3, X4.6 = 4:6), .Names = c("X1.3", "X4.6"
), row.names = c(NA, -3L), class = "data.frame")
>


-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com



More information about the R-help mailing list