[Rd] lines.formula() problem when data argument is missing (PR#13296)

smckinney at bccrc.ca smckinney at bccrc.ca
Mon Nov 17 22:00:17 CET 2008


Full_Name: Steven McKinney
Version: R 2.8.0 Patched svn rev 46845
OS: powerpc-apple-darwin9.5.0
Submission from: (NULL) (142.103.207.10)



<<insert bug report here>>

lines.formula() throws an error when subset argument is used but nothing is
provided for data argument.

Reproduce:

x<-1:5
y<-c(1,3,NA,2,5)
plot(y~x, type="n")  # set up frame
lines(y~x, subset=!is.na(y))  # works OK
lines(y~x, type="o", col="blue")  # works OK
# but
lines(y~x, subset=!is.na(y), col="red")  # gives an error:



This situation is handled appropriately by points.formula().

Following the coding style of points.formula() the 
proposed modifications would be

lines.formula <-
function (formula, data = parent.frame(), ..., subset) 
{
    m <- match.call(expand.dots = FALSE)
    if (is.matrix(eval(m$data, parent.frame()))) 
        m$data <- as.data.frame(data)
    dots <- m$...
    dots <- lapply(dots, eval, data, parent.frame())
    m$... <- NULL
    m[[1]] <- as.name("model.frame")
    m <- as.call(c(as.list(m), list(na.action = NULL)))
    mf <- eval(m, parent.frame())
    if (!missing(subset)) {
        s <- eval(m$subset, data, parent.frame())
###current        
###       l <- nrow(data)
###\current        
###new (as per points.formula)
        if (!missing(data)) {
            l <- nrow(data)
        }
        else {
            mtmp <- m
            mtmp$subset <- NULL
            l <- nrow(eval(mtmp, parent.frame()))
        }
###\new
        dosub <- function(x) if (length(x) == l) 
            x[s]
        else x
###current
###        dots <- lapply(dots, dosub, s)
###\current
###new (as per points.formula)
        dots <- lapply(dots, dosub)
###\new
    }
    response <- attr(attr(mf, "terms"), "response")
    if (response) {
        varnames <- names(mf)
        y <- mf[[response]]
        if (length(varnames) > 2) 
            stop("cannot handle more than one 'x' coordinate")
        xn <- varnames[-response]
        if (length(xn) == 0) 
            do.call("lines", c(list(y), dots))
        else do.call("lines", c(list(mf[[xn]], y), dots))
    }
    else stop("must have a response variable")
}


Original report from R-help was:


> -----Original Message-----
> From: r-help-bounces at r-project.org
[mailto:r-help-bounces at r-project.org]
> On Behalf Of John Field
> Sent: Thursday, November 13, 2008 4:41 PM
> To: r-help at r-project.org
> Subject: [R] lines.formula with subset
>
> Dear list,
>
> When I try to use lines.formula with subset and another argument I
> get an error.
> e.g.
>
> x<-1:5
> y<-c(1,3,NA,2,5)
> plot(y~x, type="n")  # set up frame
> lines(y~x, subset=!is.na(y))  # works OK
> lines(y~x, type="o", col="blue")  # works OK
> # but
> lines(y~x, subset=!is.na(y), col="red")  # gives an error:
>
> Error in if (length(x) == l) x[s] else x : argument is of length zero
>
> Why does this happen?
>

It happens because the function
graphics:::lines.formula

tries to assess the number of rows of data in the data frame
containing the variables in the formula y~x
(see the line of code
l <- nrow(data)
in graphics:::lines.formula
This is the 'el' in the 'length(x) == l'
portion of the line you see in the error message)

Because you did not provide the data frame,
nrow(data) returns NULL, and thus the
if() clause is 'length(x) == NULL' which
yields answer logical(0), an invalid
answer in an if() clause.

Done this way, all is well:

mydf <- data.frame(x = 1:5, y = c(1,3,NA,2,5))
plot(y~x, type="n", data = mydf)  # set up frame
lines(y~x, subset=!is.na(y), data = mydf)  # works OK
lines(y~x, type="o", col="blue", data = mydf)  # works OK
lines(y~x, subset=!is.na(y), col="red", data = mydf)  # works OK

The formula - based functions expect to see a dataframe object
from the 'data' arg, but don't enforce this in this case.

This may qualify as a logical bug in the graphics:::lines.formula
function.  An error should have been thrown before the if()
clause evaluation, but I'm not sure where in the chain of
function calls the check for a valid data object should be
done and the error thrown. Otherwise, the data objects
y and x that you set up should have been passed downwards
in some fashion for evaluation.  R-core members who know
the rules better than I will have to determine how best to
handle this one.

HTH

Steven McKinney

Statistician
Molecular Oncology and Breast Cancer Program
British Columbia Cancer Research Centre

email: smckinney at bccrc.ca
tel: 604-675-8000 x7561

BCCRC
Molecular Oncology
675 West 10th Ave, Floor 4
Vancouver B.C.
V5Z 1L3

Canada



> With thanks,
> John
>
> =========================
> John Field Consulting Pty Ltd
> 10 High Street, Burnside SA 5066
> Phone 08 8332 5294 or 0409 097 586
> Fax   08 8332 1229
> Email  JohnField at ozemail.com.au
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.




--please do not edit the information below--

Version:
 platform = powerpc-apple-darwin9.5.0
 arch = powerpc
 os = darwin9.5.0
 system = powerpc, darwin9.5.0
 status = Patched
 major = 2
 minor = 8.0
 year = 2008
 month = 11
 day = 06
 svn rev = 46845
 language = R
 version.string = R version 2.8.0 Patched (2008-11-06 r46845)

Locale:
C

Search Path:
 .GlobalEnv, package:stats, package:graphics, package:grDevices, package:utils,
package:datasets, package:methods, Autoloads, package:base



More information about the R-devel mailing list