[R] two questions for R beginners

(Ted Harding) Ted.Harding at manchester.ac.uk
Mon Mar 1 13:03:24 CET 2010


On 01-Mar-10 11:09:51, Petr PIKAL wrote:
> Hi
> r-help-bounces at r-project.org napsal dne 01.03.2010 11:26:40:
>> On Mon, 1 Mar 2010 11:02:59 +0100 Karl Ove Hufthammer
>> <karl at huftis.org> 
>> wrote:
>> > > * What were your biggest misconceptions or
>> > > stumbling blocks to getting up and running
>> > > with R?
>> > 
>> > Also I found it quite confusing that
>> 
>> One more thing that still trips me up sometimes. '$' works
>> on data frames but not on matrices (with dimnames/colnames).
>> Even though the two objects *look* exactly the same, '$' on
>> one of them works while '$' on the other gives a *very*
>> confusing error message. Example:
>> 
>>   d=head(iris[1:4])
>>   d2=as.matrix(d)
>> 
>>   d
>>   d2
>> 
>>   d$Sepal.Width
>>   d2$Sepal.Width
>> 
>> Some functions output matrices where you would expect them to
>> output data frames, and then this problem occurs. (Is there a
>> reason why '$' could/should not be made to 'work' on matrices too?)
> 
> I understand that 2 dimensional rectangular matrix looks quite
> similar to data frame however it is only a vector with dimensions.
> As such it can have items of only one type (numeric, character, ...).
> And you can easily change dimensions of matrix.
> 
> matrix<-1:12
> dim(matrix) <- c(2,6)
> matrix
> dim(matrix) <- c(2,2,3)
> matrix
> dim(matrix) <-NULL
> matrix
> 
> So rectangular structure of printed matrix is a kind of coincidence
> only, whereas rectangular structure of data frame is its main feature.
> 
> Regards
> Petr
>> 
>> -- 
>> Karl Ove Hufthammer

Petr, I think that could be confusing! The way I see it is that
a matrix is a special case of an array, whose "dimension" attribute
is of length 2 (number of "rows", number of "columns"); and "row"
and "column" refer to the rectangular display which you see when
R prints to matrix. And this, of course, derives directly from
the historic rectangular view of a matrix when written down.

When you went from "dim(matrix)<-c(2,6)" to "dim(matrix)<-c(2,2,3)"
you stripped it of its special title of "matrix" and cast it out
into the motley mob of arrays (some of whom are matrices, but
"matrix" no longer is).

So the "rectangular structure of printed matrix" is not a coincidence,
but is its main feature!

To come back to Karl's query about why "$" works for a dataframe
but not for a matrix, note that "$" is the extractor for getting
a named component of a list. So, Karl, when you did

  d=head(iris[1:4])

you created a dataframe:

  str(d)
  # 'data.frame':   6 obs. of  4 variables:
  #  $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4
  #  $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9
  #  $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7
  #  $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4

(with named components "Sepal.Length", ... , "Petal.Width"),
and a dataframe is a special case of a general list. In a
general list, the separate components can each be anything.
In a dataframe, each component is a vector; the different
vectors may be of different types (logical, numeric, ... )
but of course the elements of any single vector must be
of the same type; and, in a dataframe, all the vectors must
have the same length (otherwise it is a general list, not
a dataframe).

So, when you print a dataframe, R chooses to display it
as a rectangular structure. On the other hand, when you
print a general list, R displays it quite differently:

  d
  #   Sepal.Length Sepal.Width Petal.Length Petal.Width
  # 1          5.1         3.5          1.4         0.2
  # 2          4.9         3.0          1.4         0.2
  # 3          4.7         3.2          1.3         0.2
  # 4          4.6         3.1          1.5         0.2
  # 5          5.0         3.6          1.4         0.2
  # 6          5.4         3.9          1.7         0.4

  d3 <- list(C1=c(1.1,1.2,1.3), C2=c(2.1,2.2,2.3,2.4))
  d3
  # $C1
  # [1] 1.1 1.2 1.3
  # $C2
  # [1] 2.1 2.2 2.3 2.4

Notice the similarity (though not identity) between the print
of d3 and the output of str(d). There is a bit more hard-wired
stuff built into a dataframe which makes it more than simply
a "list with all components vectors of equal length). However,
one could also say that "the rectangular structure is its
main feature".

As to why "$" will not work on matrices: a matrix, as Petr
points out, is a vector with a "dimensions" attribute which
has length 2 (as opposed to a general array where the length
of the dimensions attribute could be anything). Hence it is
not a list of named components in the sense of "list".

Hence "$" will not work with a matrix, since "$" will not
be able to find any list-components. which is basically what
the error message

  d2$Sepal.Width
  # Error in d2$Sepal.Width : $ operator is invalid for atomic vectors

is telling you: d2 is an atomic vector with a length-2 dimensions
attribute. It has no list-type components for "$" to get its
hands on.

Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 01-Mar-10                                       Time: 12:03:21
------------------------------ XFMail ------------------------------



More information about the R-help mailing list