[Rd] Bug in "$<-.data.frame" yields corrupt data frame (PR#13724)

smckinney at bccrc.ca smckinney at bccrc.ca
Wed May 27 00:50:11 CEST 2009


Full_Name: Steven McKinney
Version: 2.9.0
OS: Mac OS X 10.5.6 
Submission from: (NULL) (142.103.207.10)



A corrupt data frame can be constructed as follows:
foo <- matrix(1:12, nrow = 3)
bar <- data.frame(foo)
bar$NewCol <- foo[foo[, 1] == 4, 4]
bar
lapply(bar, length)




> foo <- matrix(1:12, nrow = 3)
> bar <- data.frame(foo)
> bar$NewCol <- foo[foo[, 1] == 4, 4] bar
   X1 X2 X3 X4 NewCol
 1  1  4  7 10   <NA>
 2  2  5  8 11   <NA>
 3  3  6  9 12   <NA>
 Warning message:
 In format.data.frame(x, digits = digits, na.encode = FALSE) :
   corrupt data frame: columns will be truncated or padded with NAs
> lapply(bar, length)
 $X1
 [1] 3

 $X2
 [1] 3

 $X3
 [1] 3

 $X4
 [1] 3

 $NewCol
 [1] 0


The data.frame method is

> getAnywhere("$<-.data.frame" )
A single object matching '$<-.data.frame' was found It was found in the
following places
  package:base
  registered S3 method for $<- from namespace base
  namespace:base
with value

function (x, i, value)
{
    cl <- oldClass(x)
    class(x) <- NULL
    nrows <- .row_names_info(x, 2L)
    if (!is.null(value)) {
        N <- NROW(value)
        if (N > nrows) 
            stop(gettextf("replacement has %d rows, data has %d", 
                N, nrows), domain = NA)
        if (N < nrows && N > 0L) 
            if (nrows%%N == 0L && length(dim(value)) <= 1L) 
                value <- rep(value, length.out = nrows)
            else stop(gettextf("replacement has %d rows, data has %d", 
                N, nrows), domain = NA)
        if (is.atomic(value)) 
            names(value) <- NULL
    }
    x[[i]] <- value
    class(x) <- cl
    return(x)
}<environment: namespace:base>
>


I placed a browser() command before return(x) and did some poking
around.  The issue is that the example above creates an object with
N < nrows but N == 0L, so either an else clause to check for this
condition is needed, or, it appears to me, the N > 0L part of the
conditional clause needs to be moved to the next if clause.

I modified the rows
          if (N < nrows && N > 0L) 
            if (nrows%%N == 0L && length(dim(value)) <= 1L)
to read
           if (N < nrows) 
            if (N > 0L && nrows%%N == 0L && length(dim(value)) <= 1L)

as in

"$<-.data.frame" <-
function (x, i, value) 
{
    cl <- oldClass(x)
    class(x) <- NULL
    nrows <- .row_names_info(x, 2L)
    if (!is.null(value)) {
        N <- NROW(value)
        if (N > nrows) 
            stop(gettextf("replacement has %d rows, data has %d", 
                N, nrows), domain = NA)
        if (N < nrows) 
            if (N > 0L && nrows%%N == 0L && length(dim(value)) <= 1L) 
                value <- rep(value, length.out = nrows)
            else stop(gettextf("replacement has %d rows, data has %d", 
                N, nrows), domain = NA)
        if (is.atomic(value)) 
            names(value) <- NULL
    }
    x[[i]] <- value
    class(x) <- cl
    return(x)
} 

Now it detects the problem above:

> foo <- matrix(1:12, nrow = 3)
> bar <- data.frame(foo)
> bar$NewCol <- foo[foo[, 1] == 4, 4]
Error in `$<-.data.frame`(`*tmp*`, "NewCol", value = integer(0)) : 
  replacement has 0 rows, data has 3

It doesn't appear to stumble on weird data frames (these from the
?data.frame help page)


> L3 <- LETTERS[1:3]
> (d <- data.frame(cbind(x=1, y=1:10), fac=sample(L3, 10,
replace=TRUE)))
> (d0  <- d[, FALSE]) # NULL data frame with 10 rows
 
> (d.0 <- d[FALSE, ]) # <0 rows> data frame  (3 cols)

> (d00 <- d0[FALSE,])  # NULL data frame with 0 rows
 
> d0$NewCol <- foo[foo[, 1] == 4, 4]
Error in `$<-.data.frame`(`*tmp*`, "NewCol", value = integer(0)) : 
  replacement has 0 rows, data has 10

### Catches this problem above alright.

> d.0$NewCol <- foo[foo[, 1] == 4, 4]
> d.0
[1] x      y      fac    NewCol
<0 rows> (or 0-length row.names)

### Lets the above one through alright.

> d00$NewCol <- foo[foo[, 1] == 4, 4]
> 
> d00
[1] NewCol
<0 rows> (or 0-length row.names)
### Lets the above one through alright.


Would the above modification work to fix this problem?






> sessionInfo()
 R version 2.9.0 (2009-04-17)
 powerpc-apple-darwin8.11.1

 locale:
 en_CA.UTF-8/en_CA.UTF-8/C/C/en_CA.UTF-8/en_CA.UTF-8

 attached base packages:
 [1] stats     graphics  grDevices utils     datasets  methods   base

 other attached packages:
 [1] nlme_3.1-90

 loaded via a namespace (and not attached):
 [1] grid_2.9.0      lattice_0.17-22 tools_2.9.0


 Also occurs on Windows box with R 2.8.1



 Steven McKinney

 Statistician
 Molecular Oncology and Breast Cancer Program British Columbia Cancer
 Research Centre

 email: smckinney +at+ bccrc +dot+ ca

 tel: 604-675-8000 x7561

 BCCRC
 Molecular Oncology
 675 West 10th Ave, Floor 4
 Vancouver B.C.
 V5Z 1L3
 Canada



More information about the R-devel mailing list