[Rd] BOD causes error in 2.4.0

Martin Maechler maechler at stat.math.ethz.ch
Fri Aug 11 09:11:42 CEST 2006


>>>>> "Gabor" == Gabor Grothendieck <ggrothendieck at gmail.com>
>>>>>     on Thu, 10 Aug 2006 11:34:48 -0400 writes:

    Gabor> Using "R version 2.4.0 Under development (unstable)
    Gabor> (2006-08-08 r38825)" on Windows XP and starting in a
    Gabor> fresh session we get an error if we type BOD.  (There
    Gabor> is no error in "Version 2.3.1 Patched (2006-06-04
    Gabor> r38279)".)

    >> BOD

    Gabor> Error in data.frame(Time = c("1", "2"), demand = c(" 8.3", "10.3"),
    Gabor> check.names = FALSE,  :
    Gabor> row names contain missing values

    Gabor> In addition: Warning message:
    Gabor> corrupt data frame: columns will be truncated or padded with NAs in:
    Gabor> format.data.frame(x, digits = digits, na.encode = FALSE)

Yes, thank you Gabor.
At first, this it's peculiar that our standard checks haven't
detected this bug themselvs, since  the help page of BOD  uses
BOD  without any error..

Indeed the error happens in format.data.frame() which is called
from print.data.frame.
Interestingly, good old  str() "works" - and quite interestingly

> str(BOD)
'data.frame':	2 obs. of  2 variables:
 $ Time  : num  1 2 3 4 5 7
 $ demand: num  8.3 10.3 19 16 15.6 19.8
 - attr(*, "reference")= chr "A1.4, p. 270"

note the '2 obs' observations part when there obviously are 6 of
them ...
Now, if you really inspect the object,

> dput(BOD, control = "all")
structure(list(Time = c(1, 2, 3, 4, 5, 7), demand = c(8.3, 10.3, 
19, 16, 15.6, 19.8)), .Names = c("Time", "demand"), row.names = c(NA, 
6), class = "data.frame", reference = "A1.4, p. 270")

it becomes more clear:  the row.names have really become a mess,
where they should have been (as in R <= 2.3.x)
the equivalent of
    row.names = c("1", "2", "3", "4", "5", "6")

Now if you look at the source code,
   <..R..>/src/library/datasets/data/BOD.R
you'll see that `bug' is already in the source : it has
            row.names = c(NA, 6),
explicitly there.

Of course this has something to do with the new R-devel feature
of storing rownames ``compressedly'' when they are equivalent to 
  as.character(1:n)
and I assume the c(NA, 6) used to be a trick for making the
row.names `compressed' - however the trick was not working correctly.

I've temporarily fixed the problem by putting 
     row.names = as.character(1:6),
there.

Thanks again for the report.
Martin Maechler, ETH Zurich



More information about the R-devel mailing list