[R] two questions for R beginners

Petr PIKAL petr.pikal at precheza.cz
Wed Mar 3 16:36:45 CET 2010


Hi

that is why I consider matrix is just a vector with dimensions and 
data.frame is a rectangular structure similar to Excel table. That saved 
me a lot of surprises. 

But I must admit I am not a real beginner nowadays although I still learn 
when using R, reading help list and trying sometimes to help others.

Regards
Petr


"John Sorkin" <jsorkin at grecc.umaryland.edu> napsal dne 03.03.2010 
16:30:39:

> Petr,
> On the other hand . . .
> 
> > mat<-matrix(1:12, 3,4)
> > dat<-as.data.frame(mat)
> > mat
>      [,1] [,2] [,3] [,4]
> [1,]    1    4    7   10
> [2,]    2    5    8   11
> [3,]    3    6    9   12
> > dat
>   V1 V2 V3 V4
> 1  1  4  7 10
> 2  2  5  8 11
> 3  3  6  9 12
> 
> What you are demonstrating by your example is the manner in which the 
data are
> organized deep in the guts of R, not the way people, especially R 
beginners 
> visualize objects in their mind. When I think of the integer sixty-nine, 
I 
> visualize 69, not 1000101 despite the fact that 69, as an integer is 
> represented in the computer as 1000101.
> John
> 
> 
> 
> 
> 
> 
> 
> John David Sorkin M.D., Ph.D.
> Chief, Biostatistics and Informatics
> University of Maryland School of Medicine Division of Gerontology
> Baltimore VA Medical Center
> 10 North Greene Street
> GRECC (BT/18/GR)
> Baltimore, MD 21201-1524
> (Phone) 410-605-7119
> (Fax) 410-605-7913 (Please call phone number above prior to faxing)>>> 
Petr 
> PIKAL <petr.pikal at precheza.cz> 3/3/2010 9:44 AM >>>
> "John Sorkin" <jsorkin at grecc.umaryland.edu> napsal dne 01.03.2010 
> 15:19:10:
> 
> > If it looks like a duck and quacks like a duck, it ought to behave 
like 
> a duck.
> > 
> > To the user a matrix and a dataframe look alike . . . except a 
dataframe 
> can 
> 
> Well, matrix looks like a data.frame only on the first sight.
> 
> mat<-matrix(1:12, 3,4)
> dat<-as.data.frame(mat)
> 
> 
> str(dat)
> 'data.frame':   3 obs. of  4 variables:
>  $ V1: int  1 2 3
>  $ V2: int  4 5 6
>  $ V3: int  7 8 9
>  $ V4: int  10 11 12
> 
> str(mat)
>  int [1:3, 1:4] 1 2 3 4 5 6 7 8 9 10 ...
> 
> seems to me a pretty different look like.
> 
> Regards
> Petr
> 
> 
> > hold non-numeric values. Thus to the users, a matrix looks like a 
> special case
> > of a DF, or perhaps conversely. If you can address elements of one 
> structure 
> > using a given syntax, you should be able to address elements of the 
> other 
> > structure using the same syntax. To do otherwise leads to confusion 
and 
> is 
> > counter intuitive.
> > John
> > 
> > 
> > 
> > 
> > John David Sorkin M.D., Ph.D.
> > Chief, Biostatistics and Informatics
> > University of Maryland School of Medicine Division of Gerontology
> > Baltimore VA Medical Center
> > 10 North Greene Street
> > GRECC (BT/18/GR)
> > Baltimore, MD 21201-1524
> > (Phone) 410-605-7119
> > (Fax) 410-605-7913 (Please call phone number above prior to faxing)>>> 

> Petr 
> > PIKAL <petr.pikal at precheza.cz> 3/1/2010 8:57 AM >>>
> > Hi
> > 
> > r-help-bounces at r-project.org napsal dne 01.03.2010 13:03:24:
> > 
> > < snip>
> > 
> > > > 
> > > > I understand that 2 dimensional rectangular matrix looks quite
> > > > similar to data frame however it is only a vector with dimensions.
> > > > As such it can have items of only one type (numeric, character, 
> ...).
> > > > And you can easily change dimensions of matrix.
> > > > 
> > > > matrix<-1:12
> > > > dim(matrix) <- c(2,6)
> > > > matrix
> > > > dim(matrix) <- c(2,2,3)
> > > > matrix
> > > > dim(matrix) <-NULL
> > > > matrix
> > > > 
> > > > So rectangular structure of printed matrix is a kind of 
coincidence
> > > > only, whereas rectangular structure of data frame is its main 
> feature.
> > > > 
> > > > Regards
> > > > Petr
> > > >> 
> > > >> -- 
> > > >> Karl Ove Hufthammer
> > > 
> > > Petr, I think that could be confusing! The way I see it is that
> > > a matrix is a special case of an array, whose "dimension" attribute
> > > is of length 2 (number of "rows", number of "columns"); and "row"
> > > and "column" refer to the rectangular display which you see when
> > > R prints to matrix. And this, of course, derives directly from
> > > the historic rectangular view of a matrix when written down.
> > > 
> > > When you went from "dim(matrix)<-c(2,6)" to "dim(matrix)<-c(2,2,3)"
> > > you stripped it of its special title of "matrix" and cast it out
> > > into the motley mob of arrays (some of whom are matrices, but
> > > "matrix" no longer is).
> > > 
> > > So the "rectangular structure of printed matrix" is not a 
coincidence,
> > > but is its main feature!
> > 
> > Ok. Point taken. However I feel that possibility to manipulate 
> > matrix/array dimensions by simple changing them as I  showed above 
> > together with perceiving matrix as a **vector with dimensions** 
> prevented 
> > me especially in early days from using matrices instead of data frames 

> and 
> > vice versa. 
> > 
> > Consider cbind and rbind confusing results for vectors with unequal 
> mode. 
> > Far to often we can see something like that
> > 
> > > cbind(1:2,letters[1:2])
> >      [,1] [,2]
> > [1,] "1"  "a" 
> > [2,] "2"  "b" 
> > 
> > instead of
> > 
> > > data.frame(1:2,letters[1:2])
> >   X1.2 letters.1.2.
> > 1    1            a
> > 2    2            b
> > 
> > and then a question why does not the result behave as expected. Each 
> type 
> > of object has some features which is good for some type of 
> > manipulation/analysis/plotting bud quite detrimental for others.
> > 
> > Regards
> > Petr
> > 
> > 
> > > 
> > > To come back to Karl's query about why "$" works for a dataframe
> > > but not for a matrix, note that "$" is the extractor for getting
> > > a named component of a list. So, Karl, when you did
> > > 
> > >   d=head(iris[1:4])
> > > 
> > > you created a dataframe:
> > > 
> > >   str(d)
> > >   # 'data.frame':   6 obs. of  4 variables:
> > >   #  $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4
> > >   #  $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9
> > >   #  $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7
> > >   #  $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4
> > > 
> > > (with named components "Sepal.Length", ... , "Petal.Width"),
> > > and a dataframe is a special case of a general list. In a
> > > general list, the separate components can each be anything.
> > > In a dataframe, each component is a vector; the different
> > > vectors may be of different types (logical, numeric, ... )
> > > but of course the elements of any single vector must be
> > > of the same type; and, in a dataframe, all the vectors must
> > > have the same length (otherwise it is a general list, not
> > > a dataframe).
> > > 
> > > So, when you print a dataframe, R chooses to display it
> > > as a rectangular structure. On the other hand, when you
> > > print a general list, R displays it quite differently:
> > > 
> > >   d
> > >   #   Sepal.Length Sepal.Width Petal.Length Petal.Width
> > >   # 1          5.1         3.5          1.4         0.2
> > >   # 2          4.9         3.0          1.4         0.2
> > >   # 3          4.7         3.2          1.3         0.2
> > >   # 4          4.6         3.1          1.5         0.2
> > >   # 5          5.0         3.6          1.4         0.2
> > >   # 6          5.4         3.9          1.7         0.4
> > > 
> > >   d3 <- list(C1=c(1.1,1.2,1.3), C2=c(2.1,2.2,2.3,2.4))
> > >   d3
> > >   # $C1
> > >   # [1] 1.1 1.2 1.3
> > >   # $C2
> > >   # [1] 2.1 2.2 2.3 2.4
> > > 
> > > Notice the similarity (though not identity) between the print
> > > of d3 and the output of str(d). There is a bit more hard-wired
> > > stuff built into a dataframe which makes it more than simply
> > > a "list with all components vectors of equal length). However,
> > > one could also say that "the rectangular structure is its
> > > main feature".
> > > 
> > > As to why "$" will not work on matrices: a matrix, as Petr
> > > points out, is a vector with a "dimensions" attribute which
> > > has length 2 (as opposed to a general array where the length
> > > of the dimensions attribute could be anything). Hence it is
> > > not a list of named components in the sense of "list".
> > > 
> > > Hence "$" will not work with a matrix, since "$" will not
> > > be able to find any list-components. which is basically what
> > > the error message
> > > 
> > >   d2$Sepal.Width
> > >   # Error in d2$Sepal.Width : $ operator is invalid for atomic 
vectors
> > > 
> > > is telling you: d2 is an atomic vector with a length-2 dimensions
> > > attribute. It has no list-type components for "$" to get its
> > > hands on.
> > > 
> > > Ted.
> > > 
> > > --------------------------------------------------------------------
> > > E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk>
> > > Fax-to-email: +44 (0)870 094 0861
> > > Date: 01-Mar-10                                       Time: 12:03:21
> > > ------------------------------ XFMail ------------------------------
> > > 
> > > ______________________________________________
> > > R-help at r-project.org mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help 
> > > PLEASE do read the posting guide 
> > http://www.R-project.org/posting-guide.html 
> > > and provide commented, minimal, self-contained, reproducible code.
> > 
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help 
> > PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html 
> > and provide commented, minimal, self-contained, reproducible code.
> > 
> > Confidentiality Statement:
> > This email message, including any attachments, is for the sole use of 
> the 
> > intended recipient(s) and may contain confidential and privileged 
> information.
> > Any unauthorized use, disclosure or distribution is prohibited.  If 
you 
> are 
> > not the intended recipient, please contact the sender by reply email 
and 
> 
> > destroy all copies of the original message. 
> 
> 
> Confidentiality Statement:
> This email message, including any attachments, is for ...{{dropped:10}}



More information about the R-help mailing list