[R] Data.frame Vs Matrix Vs Array: Definitions Please

Ivan Calandra ivan.calandra at uni-hamburg.de
Wed Oct 27 14:38:16 CEST 2010


My comments are in the text.

Le 10/27/2010 12:11, Gabor Grothendieck a écrit :
> On Wed, Oct 27, 2010 at 4:03 AM, Ivan Calandra
> <ivan.calandra at uni-hamburg.de>  wrote:
>> Hi,
>>
>> Gabor gave you a great answer already. But I would add a few precisions.
>> Someone please correct me if I'm wrong.
>>
>> Arrays are matrices with more than 2 dimensions. Put the other way: matrices
>> are arrays with only 2 dimensions.
> Arrays can have any number of dimensions including 1, 2, 3, etc.
>
> 	>  # a 2d array is a matrix. Its composed from a vector plus two dimensions.
> 	>  m<- array(1:4, c(2, 2))
> 	>  dput(m)
> 	structure(1:4, .Dim = c(2L, 2L))
> 	>  class(m)
> 	[1] "matrix"
> 	>  is.array(m)
> 	[1] TRUE
>
> 	>  # a 1d array is a vector plus a single dimension
> 	>  a1<- array(1:4, 4)
> 	>  dput(a1)
> 	structure(1:4, .Dim = 4L)
> 	>  dim(a1)
> 	[1] 4
> 	>  class(a1)
> 	[1] "array"
> 	>  is.array(a1)
> 	[1] TRUE
>
> 	>  # if we remove dimension part its no longer an array but just a vector
> 	>  nota<- a1
> 	>  dim(nota)<- NULL
> 	>  dput(nota)
> 	1:4
> 	>  is.array(nota)
> 	[1] FALSE
> 	>  is.vector(nota)
> 	[1] TRUE
What I don't understand is why vectors (with more than one value) don't 
have dimensions. They look like they do have 1 dimension. For me no 
dimension would be a scalar. Like in geometry: a point has no dimension, 
a line has 1, a square has 2, a cube 3 and so on. Is it because of some 
internal process? The intuitive geometry way of thinking is not 
programmatically relevant?


>> I would also add these:
>> - the components of a vector have to be of the same mode (character,
>> numeric, integer...)
> however, a list with no attributes is a vector too so this is a vector:
>
>     >   vl<- list(sin, 3, "a")
>     >   is.vector(vl)
>     [1] TRUE
>
> A vector may not have attributes so arrays and factors are not vectors
> although they are composed from vectors.
That's also completely unexpected for me! What is then a vector?! And 
then the difference between a vector and a list?! I mean, in practice, 
it's not so important, my understanding is probably enough for what I'm 
doing in R, but I'd like to understand how it works.

Also you wrote that a vector may not have attributes. I might be wrong 
(and certainly am), but aren't names attributes? So with is a named list 
still a vector:
my.list <- list(num=1:3, let=LETTERS[1:2])
names(my.list)
[1] "num" "let"
is.vector(my.list)
[1] TRUE
>> - which implies that the components of matrices and arrays have to be also
>> of the same mode (which might lead to some coercion of your data if you
>> don't pay attention to it).
>>
>> Factor are character data, but coded as numeric mode. Each number is
>> associated with a given string, the so-called levels. Here is an example:
>> my.fac<- factor(c("something", "other", "more", "something", "other",
>> "more"))
> A factor is composed of an integer vector plus a levels attribute
> (called .Label internally) as in this code:
>
>     >  fac<- factor(c("b", "a", "b"))
>     >  dput(fac)
>     structure(c(2L, 1L, 2L), .Label = c("a", "b"), class = "factor")
>     >  levels(fac)
>     [1] "a" "b"
I like this explanation for a factor, I couldn't find these exact words!

Thanks for the clarifications anyway!
Ivan

>> my.fac
>>   [1] something other     more      something other     more
>>   Levels: more other something
>> mode(my.fac)
>>   [1] "numeric"    ## coded as numeric even though you gave character
>> strings!
>> class(my.fac)
>>   [1] "factor"
>> levels(my.fac)
>>   [1] "more"      "other"     "something"
>> as.numeric(my.fac)
>>   [1] 3 2 1 3 2 1                  ## internal representation
>> as.character(my.fac)
>> [1] "something" "other"     "more"      "something" "other"     "more"    ##
>> what you think it is!
>>
>> I found that the book "Data Manipulation with R" from Phil Spector (2008)
>>   was quite well done to explain all these object modes and classes, even
>> though I wouldn't have understood completely by reading only this book (not
>> that I have yet completely mastered this topic...)
>>
>> HTH,
>> Ivan
>>
>>
>>
>> Le 10/27/2010 02:49, Gabor Grothendieck a écrit :
>>> On Tue, Oct 26, 2010 at 8:37 PM, Matt Curcio<matt.curcio.ri at gmail.com>
>>>   wrote:
>>>> Hi All,
>>>> I am learning R and having a little trouble with the usage and proper
>>>> definitions of data.frames vs. matrix vs vectors. I have read many R
>>>> tutorials, and looked over ump-teen 'cheat' sheets and have found that
>>>> no one has articulated a really good definition of the differences
>>>> between 'data.frames', 'matrix', and 'arrays' and even 'factors'.  I
>>>> realize that I might have missed someones R tutorial, and actually
>>>> would like to receive 'your' most concise or most useful tutorial.
>>>> Any help would be appreciated.
>>>>
>>>> My particular favorite explanation and helpful hint is from the
>>>> 'R-Inferno'.  Don't get me wrong...  I think this pdf is great and
>>>> some tables are excellent. Overall it is a very good primer but this
>>>> one section leaves me puzzled.  This quote belies the lack of hard and
>>>> fast rules for what and when to use 'data.frames', 'matrix', and
>>>> 'arrays'.  It discusses ways in which to simplify your work.
>>>>
>>>> Here are a few possibilities for simplifying:
>>>> • Don’t use a list when an atomic vector will do.
>>>> • Don’t use a data frame when a matrix will do.
>>>> • Don’t try to use an atomic vector when a list is needed.
>>>> • Don’t try to use a matrix when a data frame is needed.
>>>>
>>>> Cheers,
>>>> Matt C
>>> Look at their internal representations and it will become clearer.  v,
>>> a vector, has length 6.  m, a matrix, is actually the same as the
>>> vector v except is has dimensions too. Since m is just a vector with
>>> dimensions, m has length 6 as well.  L is a list and has length 2
>>> because its a vector each of whose components is itself a vector.  DF
>>> is a data frame and is the same as L except its 2 components must each
>>> have the same length and it must have row and column names.  If you
>>> don't assign the row and column names they are automatically generated
>>> as we can see.  Note that row.names = c(NA, -3L) is a short form for
>>> row names of 1:3 and .Names internally refers to column names.
>>>
>>>> v<- 1:6 # vector
>>>> dput(v)
>>> 1:6
>>>> m<- v; dim(m)<- 2:3 # m is a matrix since we added dimensions
>>>> dput(m)
>>> structure(1:6, .Dim = 2:3)
>>>> L<- list(1:3, 4:6)
>>>> dput(L)
>>> list(1:3, 4:6)
>>>> DF<- data.frame(1:3, 4:6)
>>>> dput(DF)
>>> structure(list(X1.3 = 1:3, X4.6 = 4:6), .Names = c("X1.3", "X4.6"
>>> ), row.names = c(NA, -3L), class = "data.frame")
>>>
>> --
>> Ivan CALANDRA
>> PhD Student
>> University of Hamburg
>> Biozentrum Grindel und Zoologisches Museum
>> Abt. Säugetiere
>> Martin-Luther-King-Platz 3
>> D-20146 Hamburg, GERMANY
>> +49(0)40 42838 6231
>> ivan.calandra at uni-hamburg.de
>>
>> **********
>> http://www.for771.uni-bonn.de
>> http://webapp5.rrz.uni-hamburg.de/mammals/eng/mitarbeiter.php
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>

-- 
Ivan CALANDRA
PhD Student
University of Hamburg
Biozentrum Grindel und Zoologisches Museum
Abt. Säugetiere
Martin-Luther-King-Platz 3
D-20146 Hamburg, GERMANY
+49(0)40 42838 6231
ivan.calandra at uni-hamburg.de

**********
http://www.for771.uni-bonn.de
http://webapp5.rrz.uni-hamburg.de/mammals/eng/mitarbeiter.php



More information about the R-help mailing list