[Rd] Bounty on Error Checking

ivo welch ivo.welch at gmail.com
Fri Jan 4 17:28:26 CET 2013


gents---first, thanks a lot for paying some attention to my suggestion.

I always write programs with options(warn=2).  but it doesn't cover
everything.  In particular, my code is littered with hand tests that
the dimensions are correct and that variables are defined:

stopifnot(is.data.frame(d) & exists("x", where="d") & (is.numeric(x))

of course, I should already know that d$x is what it is supposed to
be, but the whole point is that I make mistakes.  my suggestion is
also not so much for myself, but for all of our students that we get
involved with R.  it is one thing for me to fix my own problems.  it
is another for me to be comfortable recommending our students to learn
programming in R.

perl has an exact equivalent to the variable definition that I think
would help a great deal:
   use warnings FATAL => qw{ uninitialized };

one can test whether a variable is NULL.  one can assign to a variable
that is NULL.  one cannot *use* a variable that is NULL.   I presume
this mostly means that code such as d=data.frame( x=2 ); return d$y+2
would abort.   I guess a better perspective would be that R should
limit what one can do with NULL, not that it should limit the
variables.  R is too generous in allowing mismatched operations.
(this also applies to silent automatic repetition of matrices to make
dimensions fit;  for every time that it helps, there are probably two
times when it bites.)  not being a programmer or language designer, I
am not the best person to suggest what to improve.  but the strictness
of R seems too lax right now, making error tracking too difficult,
from the perspective of an end user.  this is partly why I suggested a
more general bounty to improve on this aspect of R, rather than on my
specific issue(s).

best,

/iaw
----
Ivo Welch (ivo.welch at gmail.com)
http://www.ivo-welch.info/
J. Fred Weston Professor of Finance
Anderson School at UCLA, C519
Director, UCLA Anderson Fink Center for Finance and Investments
Free Finance Textbook, http://book.ivo-welch.info/
Editor, Critical Finance Review, http://www.critical-finance-review.org/



On Fri, Jan 4, 2013 at 7:38 AM, Matthew Dowle <mdowle at mdowle.plus.com> wrote:
> On 04.01.2013 15:22, Duncan Murdoch wrote:
>>
>> On 04/01/2013 10:15 AM, Matthew Dowle wrote:
>>>
>>> On 04.01.2013 14:56, Duncan Murdoch wrote:
>>> > On 04/01/2013 9:51 AM, Matthew Dowle wrote:
>>> >> On 04.01.2013 14:03, Duncan Murdoch wrote:
>>> >> > On 13-01-04 8:32 AM, Matthew Dowle wrote:
>>> >> >>
>>> >> >> On Fri, Jan 3, 2013, Bert Gunter wrote
>>> >> >>> Well...
>>> >> >>>
>>> >> >>> On Thu, Jan 3, 2013 at 10:00 AM, ivo welch <ivo.welch <at>
>>> >> >>> anderson.ucla.edu> wrote:
>>> >> >>>>
>>> >> >>>> Dear R developers---I just spent half a day debugging an R
>>> >> >>>> program,
>>> >> >>>> which had two bugs---I selected the wrongly named variable,
>>> >> which
>>> >> >>>> turns out to have been a scalar, which then happily multiplied
>>> >> as
>>> >> >>>> if
>>> >> >>>> it was a matrix; and another wrongly named variable from a data
>>> >> >>>> frame,
>>> >> >>>> that triggered no error when used as a[["name"]] or a$name .
>>> >> >>>> there
>>> >> >>>> should be an option to turn on that throws an error inside R
>>> >> when
>>> >> >>>> one
>>> >> >>>> does this.  I cannot imagine that there is much code that wants
>>> >> to
>>> >> >>>> reference non-existing columns in data frames.
>>> >> >>>
>>> >> >>> But I can -- and do it all the time: To add a new variable, "d"
>>> >> to
>>> >> >>> a
>>> >> >>> data frame, df,  containing only "a" and "b" (with 10 rows,
>>> >> say):
>>> >> >>>
>>> >> >>> df[["d"]] <- 1:10
>>> >> >>
>>> >> >> Yes but that's `[[<-`. Ivo was talking about `[[` and `$`; i.e.,
>>> >> >> select
>>> >> >> only not assign, if I understood correctly.
>>> >> >>
>>> >> >>>
>>> >> >>> Trying to outguess documentation to create error triggers is a
>>> >> very
>>> >> >>> bad idea.
>>> >> >>
>>> >> >> Why exactly is it a very bad idea? (I don't necessarily disagree,
>>> >> >> just
>>> >> >> asking
>>> >> >> for more colour.)
>>> >> >>
>>> >> >>> R already has plenty of debugging tools -- and there is even a
>>> >> >>> "debug"
>>> >> >>> package. Perhaps you need a better programming editor/IDE. There
>>> >> >>> are
>>> >> >>> several listed on CRAN, RStudio, etc.
>>> >> >>
>>> >> >> True, but that relies on you knowing there's a bug to hunt for.
>>> >> What
>>> >> >> if
>>> >> >> you
>>> >> >> don't know you're getting incorrect results, silently? In a
>>> >> similar
>>> >> >> way
>>> >> >> that options(warn=2) turns known warnings into errors, to enable
>>> >> you
>>> >> >> to
>>> >> >> be
>>> >> >> more strict if you wish,
>>> >> >
>>> >> > I would say the point of options(warn=2) is rather to let you find
>>> >> > the location of the warning more easily, because it will abort the
>>> >> > evaluation.
>>> >>
>>> >> True but as well as that, I sometimes like to run production systems
>>> >> with
>>> >> options(warn=2). I'd prefer some tasks to halt at the slightest hint
>>> >> of
>>> >> trouble than write a warning silently to a log file that may not be
>>> >> looked
>>> >> at. I think of that as being more strict, more robust. Since
>>> >> option(warn=2)
>>> >> is set even when there is no warning, to catch if one arises in
>>> >> future.
>>> >> Not
>>> >> just to find it more easily once you know there is a warning.
>>> >>
>>> >> > I would not recommend using code that issues warnings.
>>> >>
>>> >> Not sure what you mean here.
>>> >
>>> > I just meant that I consider warnings to be a problem (as you do), so
>>> > they should all be fixed.
>>>
>>> I see now, good.
>>>
>>> >
>>> >>
>>> >> >
>>> >> > an option to turn on warnings from `[[` and
>>> >> >> `$`
>>> >> >> if the column is missing (select only, not assign) doesn't seem
>>> >> like
>>> >> >> a
>>> >> >> bad option to have. Maybe it would reveal some previously silent
>>> >> >> bugs.
>>> >> >
>>> >> > I agree that this would sometimes be useful, but a very common
>>> >> > convention is to do something like
>>> >> >
>>> >> > if (is.null(obj$element)) {  do something }
>>> >> >
>>> >> > These would all have to be re-written to something like
>>> >> >
>>> >> > if (missing.field(obj, "element") { do something }
>>> >> >
>>> >> > There are several hundred examples of the first usage in base R; I
>>> >> > imagine thousands more in contributed packages.
>>> >>
>>> >> Yes but Ivo doesn't seem to be writing that if() in his code. We're
>>> >> only talking about an option that users can turn on for their own
>>> >> code, iiuc. Not anything that would affect or break thousands of
>>> >> packages. That's why I referred to the fact that all packages now
>>> >> have namespaces, in the earlier post.
>>> >>
>>> >> > I don't think the
>>> >> > benefit of the change is worth all the work that would be
>>> >> necessary
>>> >> > to
>>> >> > implement it.
>>> >>
>>> >> It doesn't seem to be a lot of work. I already posted a working
>>> >> straw man, for example, as a first step.
>>> >
>>> > I understood the proposal to be that evaluating "obj$element" would
>>> > issue a warning if element didn't exist.  If that were the case, then
>>> > the common test
>>> >
>>> > is.null(obj$element)
>>> >
>>> > would issue a warning in the cases where it now returns TRUE.
>>>
>>> Yes, but only for obj$element appearing in Ivo's own code. Not if a
>>> package
>>> does that (including base). That's why I thought masking "[[<-" and
>>> "$<-"
>>> in .GlobalEnv might achieve that without affecting packages or base,
>>> although
>>> I don't know how such an option could be made available by R.
>>> Maybe options(strictselect=TRUE) would create those masks in
>>> .GlobalEnv,
>>> and options(strictselect=FALSE) would remove them. A package maintainer
>>> might choose to set that in their package to make it stricter (which
>>> would
>>> create those masks in the package's namespace too).
>>>
>>> Or users could just create those masks themselves, since it's only a
>>> few
>>> lines. Without affecting packages or base.
>>
>>
>> options() are global
>
>
> I realise that. I was thinking that inside the options() function it
> could see if strictselect was being changed and then create the masks
> in .GlobalEnv. But I can see that is ugly, was just thinking out loud.
> Wasn't suggesting that "[[" would look at the value of strictselect.
>
>
>> but a package could change the meaning of $ or
>> [[.  It could even export those new definitions so that people who
>> wanted the strict usage could use it.  It would be hard to get the
>> same performance as the base definitions, but for debugging purposes
>> that might not matter.
>
>
> So in principle this would be a (small) good idea then?  Is it an
> option that R could provide? i.e. something for which a patch file
> for R would be considered by R core?
>
> Matthew
>
>



More information about the R-devel mailing list