[R] two questions for R beginners

John Sorkin jsorkin at grecc.umaryland.edu
Wed Mar 3 16:30:39 CET 2010


Petr,
On the other hand . . .

> mat<-matrix(1:12, 3,4)
> dat<-as.data.frame(mat)
> mat
     [,1] [,2] [,3] [,4]
[1,]    1    4    7   10
[2,]    2    5    8   11
[3,]    3    6    9   12
> dat
  V1 V2 V3 V4
1  1  4  7 10
2  2  5  8 11
3  3  6  9 12

What you are demonstrating by your example is the manner in which the data are organized deep in the guts of R, not the way people, especially R beginners visualize objects in their mind. When I think of the integer sixty-nine, I visualize 69, not 1000101 despite the fact that 69, as an integer is represented in the computer as 1000101.
John







John David Sorkin M.D., Ph.D.
Chief, Biostatistics and Informatics
University of Maryland School of Medicine Division of Gerontology
Baltimore VA Medical Center
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524
(Phone) 410-605-7119
(Fax) 410-605-7913 (Please call phone number above prior to faxing)>>> Petr PIKAL <petr.pikal at precheza.cz> 3/3/2010 9:44 AM >>>
"John Sorkin" <jsorkin at grecc.umaryland.edu> napsal dne 01.03.2010 
15:19:10:

> If it looks like a duck and quacks like a duck, it ought to behave like 
a duck.
> 
> To the user a matrix and a dataframe look alike . . . except a dataframe 
can 

Well, matrix looks like a data.frame only on the first sight.

mat<-matrix(1:12, 3,4)
dat<-as.data.frame(mat)


str(dat)
'data.frame':   3 obs. of  4 variables:
 $ V1: int  1 2 3
 $ V2: int  4 5 6
 $ V3: int  7 8 9
 $ V4: int  10 11 12

str(mat)
 int [1:3, 1:4] 1 2 3 4 5 6 7 8 9 10 ...

seems to me a pretty different look like.

Regards
Petr


> hold non-numeric values. Thus to the users, a matrix looks like a 
special case
> of a DF, or perhaps conversely. If you can address elements of one 
structure 
> using a given syntax, you should be able to address elements of the 
other 
> structure using the same syntax. To do otherwise leads to confusion and 
is 
> counter intuitive.
> John
> 
> 
> 
> 
> John David Sorkin M.D., Ph.D.
> Chief, Biostatistics and Informatics
> University of Maryland School of Medicine Division of Gerontology
> Baltimore VA Medical Center
> 10 North Greene Street
> GRECC (BT/18/GR)
> Baltimore, MD 21201-1524
> (Phone) 410-605-7119
> (Fax) 410-605-7913 (Please call phone number above prior to faxing)>>> 
Petr 
> PIKAL <petr.pikal at precheza.cz> 3/1/2010 8:57 AM >>>
> Hi
> 
> r-help-bounces at r-project.org napsal dne 01.03.2010 13:03:24:
> 
> < snip>
> 
> > > 
> > > I understand that 2 dimensional rectangular matrix looks quite
> > > similar to data frame however it is only a vector with dimensions.
> > > As such it can have items of only one type (numeric, character, 
...).
> > > And you can easily change dimensions of matrix.
> > > 
> > > matrix<-1:12
> > > dim(matrix) <- c(2,6)
> > > matrix
> > > dim(matrix) <- c(2,2,3)
> > > matrix
> > > dim(matrix) <-NULL
> > > matrix
> > > 
> > > So rectangular structure of printed matrix is a kind of coincidence
> > > only, whereas rectangular structure of data frame is its main 
feature.
> > > 
> > > Regards
> > > Petr
> > >> 
> > >> -- 
> > >> Karl Ove Hufthammer
> > 
> > Petr, I think that could be confusing! The way I see it is that
> > a matrix is a special case of an array, whose "dimension" attribute
> > is of length 2 (number of "rows", number of "columns"); and "row"
> > and "column" refer to the rectangular display which you see when
> > R prints to matrix. And this, of course, derives directly from
> > the historic rectangular view of a matrix when written down.
> > 
> > When you went from "dim(matrix)<-c(2,6)" to "dim(matrix)<-c(2,2,3)"
> > you stripped it of its special title of "matrix" and cast it out
> > into the motley mob of arrays (some of whom are matrices, but
> > "matrix" no longer is).
> > 
> > So the "rectangular structure of printed matrix" is not a coincidence,
> > but is its main feature!
> 
> Ok. Point taken. However I feel that possibility to manipulate 
> matrix/array dimensions by simple changing them as I  showed above 
> together with perceiving matrix as a **vector with dimensions** 
prevented 
> me especially in early days from using matrices instead of data frames 
and 
> vice versa. 
> 
> Consider cbind and rbind confusing results for vectors with unequal 
mode. 
> Far to often we can see something like that
> 
> > cbind(1:2,letters[1:2])
>      [,1] [,2]
> [1,] "1"  "a" 
> [2,] "2"  "b" 
> 
> instead of
> 
> > data.frame(1:2,letters[1:2])
>   X1.2 letters.1.2.
> 1    1            a
> 2    2            b
> 
> and then a question why does not the result behave as expected. Each 
type 
> of object has some features which is good for some type of 
> manipulation/analysis/plotting bud quite detrimental for others.
> 
> Regards
> Petr
> 
> 
> > 
> > To come back to Karl's query about why "$" works for a dataframe
> > but not for a matrix, note that "$" is the extractor for getting
> > a named component of a list. So, Karl, when you did
> > 
> >   d=head(iris[1:4])
> > 
> > you created a dataframe:
> > 
> >   str(d)
> >   # 'data.frame':   6 obs. of  4 variables:
> >   #  $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4
> >   #  $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9
> >   #  $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7
> >   #  $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4
> > 
> > (with named components "Sepal.Length", ... , "Petal.Width"),
> > and a dataframe is a special case of a general list. In a
> > general list, the separate components can each be anything.
> > In a dataframe, each component is a vector; the different
> > vectors may be of different types (logical, numeric, ... )
> > but of course the elements of any single vector must be
> > of the same type; and, in a dataframe, all the vectors must
> > have the same length (otherwise it is a general list, not
> > a dataframe).
> > 
> > So, when you print a dataframe, R chooses to display it
> > as a rectangular structure. On the other hand, when you
> > print a general list, R displays it quite differently:
> > 
> >   d
> >   #   Sepal.Length Sepal.Width Petal.Length Petal.Width
> >   # 1          5.1         3.5          1.4         0.2
> >   # 2          4.9         3.0          1.4         0.2
> >   # 3          4.7         3.2          1.3         0.2
> >   # 4          4.6         3.1          1.5         0.2
> >   # 5          5.0         3.6          1.4         0.2
> >   # 6          5.4         3.9          1.7         0.4
> > 
> >   d3 <- list(C1=c(1.1,1.2,1.3), C2=c(2.1,2.2,2.3,2.4))
> >   d3
> >   # $C1
> >   # [1] 1.1 1.2 1.3
> >   # $C2
> >   # [1] 2.1 2.2 2.3 2.4
> > 
> > Notice the similarity (though not identity) between the print
> > of d3 and the output of str(d). There is a bit more hard-wired
> > stuff built into a dataframe which makes it more than simply
> > a "list with all components vectors of equal length). However,
> > one could also say that "the rectangular structure is its
> > main feature".
> > 
> > As to why "$" will not work on matrices: a matrix, as Petr
> > points out, is a vector with a "dimensions" attribute which
> > has length 2 (as opposed to a general array where the length
> > of the dimensions attribute could be anything). Hence it is
> > not a list of named components in the sense of "list".
> > 
> > Hence "$" will not work with a matrix, since "$" will not
> > be able to find any list-components. which is basically what
> > the error message
> > 
> >   d2$Sepal.Width
> >   # Error in d2$Sepal.Width : $ operator is invalid for atomic vectors
> > 
> > is telling you: d2 is an atomic vector with a length-2 dimensions
> > attribute. It has no list-type components for "$" to get its
> > hands on.
> > 
> > Ted.
> > 
> > --------------------------------------------------------------------
> > E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk>
> > Fax-to-email: +44 (0)870 094 0861
> > Date: 01-Mar-10                                       Time: 12:03:21
> > ------------------------------ XFMail ------------------------------
> > 
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help 
> > PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html 
> > and provide commented, minimal, self-contained, reproducible code.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help 
> PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html 
> and provide commented, minimal, self-contained, reproducible code.
> 
> Confidentiality Statement:
> This email message, including any attachments, is for ...{{dropped:20}}



More information about the R-help mailing list