[Rd] several bugs (PR#918) lists and matrices

Rich Heiberger rmh@surfer.sbm.temple.edu
Mon, 30 Apr 2001 17:55:45 -0400 (EDT)

[Rd] several bugs (PR#918) lists and matrices

Thank you Thomas for suggesting I review the proceedings paper on data
by Rob.  I liked that paper when I heard it and was happy to reread
it.  I agree with everything he said there.  After reading it I went
back to the Blue Book and have several comments.

Thomas said
                                You can't put a list into a matrix.
  Matrices handle homogenous data; they are vectors with a dimension
  attribute.  Lists with an arbitrary dimension attribute are, as Rob
  pointed out, an unimplemented bug. However, rectangle things with
  arbitrary data in them do exist. They're called data frames.

Thomas's statement is a clear change in perspective from the original
Blud Book interpretation.  The blue book position and my starting
position is that "All objects are vectors."  Objects can be atomic or
recursive, and it is very clear in the text that subscripting is
applied to all vectors, no matter the complexity.  The description of
dim(x) on page 438 is very clear: x is any object.  The descriptions
of array() (page 382) and matrix() (page 504) are equally clear: "the
array class of objects are those that have an attribute dim, ...."

John's recent file
repeats this generality with
  The original history of S (if you're interested see the note) led to
  a vector-oriented approach to all objects; that is, all objects were
  vectors (one-way arrays). Lists and other recursive objects were
  special only in that the elements of the vector were themselves

It is less clear in the R help, but still indicated in the phrase
"Retrieve or set the dimension of an OBJECT." in the description of
dim().  The R help for array() and matrix() both say "data: a vector
giving data to fill the array" and the help for vector() clearly
permits vectors of mode list.  Nowhere do I see a claim that matrices
or arrays must contain only homogeneous atomic objects.

R currently (rw1021) believes a dimensioned list is a matrix (but not
a vector).
  > x <- list(1,2,3,4,5,6,7,8,9,10)
  > is.vector(x)
  [1] TRUE
  > dim(x) <- c(2,5)
  > x
       [,1]        [,2]        [,3]        [,4]        [,5]       
  [1,] "Numeric,1" "Numeric,1" "Numeric,1" "Numeric,1" "Numeric,1"
  [2,] "Numeric,1" "Numeric,1" "Numeric,1" "Numeric,1" "Numeric,1"
  > is.matrix(x)
  [1] TRUE
  > is.vector(x)
  [1] FALSE

A data frame is essentially a list of arbitrarily structured column
vectors.  The recognition that Rob has in this paper is that the
individual columns can themselves contain arbitrarily structured data.
He is doing marvelous things with those data frame columns.
Everything he does will work on dimensioned lists and will avoid the
the two serious difficulties with data frames that he notes.

a. The whole issue of character strings becoming factors and the
non-nonintuitive I() and AsIs is a consequence of using data frames.
This issue vanishes when dimensioned lists are used directly.

b. Parameterized data frames can be replaced by a new data structure
in which a "parameter" attribute holds the relevant information.

My attempt to construct a "missing.value" class would work with a
data frame structure.	I don't think it is the best structure.

In summary, I am arguing for permitting
   a <- matrix(list(1,2,3,4,5,6), 2, 3)
along with the currently acceptable
   a <- list(1,2,3,4,5,6)
   dim(a) <- c(2, 3)
and for making the subscripting
provide the correct answer
     [,1] [,2] 
[1,] 1    3   
[2,] 2    4   
rather than the current incorrect answer
     [,1]   [,2]  
[1,] "NULL" "NULL"
[2,] "NULL" "NULL"
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch