[Rd] rbind has confusing result for custom sub-class (possible bug?)

Joshua Ulrich jo@h@m@u|r|ch @end|ng |rom gm@||@com
Mon May 27 16:34:34 CEST 2019


Follow-up (inline) on my comment about a potential issue in `[<-.Date`.

On Mon, May 27, 2019 at 9:31 AM Michael Chirico
<michaelchirico4 using gmail.com> wrote:
>
> Yes, thanks for following up on thread here. And thanks again for clearing things up, your email was a finger snap of clarity on the whole issue.
>
> I'll add that actually it was data.table's code at fault on the storage conversion -- note that if you use an arbitrary sub-class 'foo' with no methods defined, it'll stay integer.
>
> That's because [<- calls as.Date and then as.Date.IDate, and that method (ours) has as.numeric(); earlier I had recognized that if we commented that line, the issue was "fixed" but I still wasn't understanding the root cause.
>
> My last curiosity on this issue will be in my follow-up thread.
>
> Mike C
>
> On Mon, May 27, 2019, 10:25 PM Joshua Ulrich <josh.m.ulrich using gmail.com> wrote:
>>
>> On Sun, May 26, 2019 at 6:47 AM Joshua Ulrich <josh.m.ulrich using gmail.com> wrote:
>> >
>> > On Sun, May 26, 2019 at 4:06 AM Michael Chirico
>> > <michaelchirico4 using gmail.com> wrote:
>> > >
>> > > Have finally managed to come up with a fix after checking out sys.calls()
>> > > from within the as.Date.IDate debugger, which shows something like:
>> > >
>> > > [[1]] rbind(DF, DF)
>> > > [[2]] rbind(deparse.level, ...)
>> > > [[3]] `[<-`(`*tmp*`, ri, value = 18042L)
>> > > [[4]] `[<-.Date`(`*tmp*`, ri, value = 18042L)
>> > > [[5]] as.Date(value)
>> > > [[6]] as.Date.IDate(value)
>> > >
>> > > I'm not sure why [<- is called, I guess the implementation is to assign to
>> > > the output block by block? Anyway, we didn't have a [<- method. And
>> > > [<-.Date looks like:
>> > >
>> > > value <- unclass(as.Date(value)) # <- converts to double
>> > > .Date(NextMethod(.Generic), oldClass(x)) # <- restores 'IDate' class
>> > >
>> > > So we can fix our bug by defining a [<- class; the question that I still
>> > > don't see answered in documentation or source code is, why/where is [<-
>> > > called, exactly?
>> > >
>> > Your rbind(DF, DF) call dispatches to base::rbind.data.frame().  The
>> > `[<-` call is this line:
>> > value[[jj]][ri] <- if (is.factor(xij)) as.vector(xij) else xij
>> >
>> > That's where the storage.mode changes from integer to double.
>> >
>> > debug: value[[jj]][ri] <- if (is.factor(xij)) as.vector(xij) else xij
>> > Browse[2]>
>> > debug: xij
>> > Browse[2]> storage.mode(xij)
>> > [1] "integer"
>> > Browse[2]> value[[jj]][ri]
>> > [1] "2019-05-26"
>> > Browse[2]> storage.mode(value[[jj]][ri])
>> > [1] "integer"
>> > Browse[2]>
>> > debug: if (!is.null(nm <- names(xij))) names(value[[jj]])[ri] <- nm
>> > Browse[2]> storage.mode(value[[jj]][ri])
>> > [1] "double"
>> >
>> To be clear, I don't think this is a bug in rbind() or
>> rbind.data.frame().  The confusion is that rbind.data.frame() calls
>> `[<-` for each column of the data.frame, and there is no `[<-.IDate`
>> method.  So the parent class method is dispatched, which converts the
>> storage mode to double.
>>
>> Someone may argue that this is an issue with `[<-.Date`, and that it
>> shouldn't convert the storage.mode from integer to double.

I don't think this is an issue.  The storage mode isn't converted if
the replacement is the same storage mode.  For example:

R> x <- .Date(1:5)
R> storage.mode(x)
[1] "integer"
R> x[1L] <- .Date(0L)
R> storage.mode(x)
[1] "integer"
R> x[1L] <- .Date(0)
R> storage.mode(x)
[1] "double"

>> >
>> > > Mike C
>> > >
>> > > On Sun, May 26, 2019 at 1:16 PM Michael Chirico <michaelchirico4 using gmail.com>
>> > > wrote:
>> > >
>> > > > Debugging this issue:
>> > > >
>> > > > https://github.com/Rdatatable/data.table/issues/2008
>> > > >
>> > > > We have custom class 'IDate' which inherits from 'Date' (it just forces
>> > > > integer storage for efficiency, hence, I).
>> > > >
>> > > > The concatenation done by rbind, however, breaks this and returns a double:
>> > > >
>> > > > library(data.table)
>> > > > DF = data.frame(date = as.IDate(Sys.Date()))
>> > > > storage.mode(rbind(DF, DF)$date)
>> > > > # [1] "double"
>> > > >
>> > > > This is specific to base::rbind (data.table's rbind returns an integer as
>> > > > expected); in ?rbind we see:
>> > > >
>> > > > The method dispatching is not done via UseMethod(), but by C-internal
>> > > > dispatching. Therefore there is no need for, e.g., rbind.default.
>> > > > The dispatch algorithm is described in the source file
>> > > > (‘.../src/main/bind.c’) as
>> > > > 1. For each argument we get the list of possible class memberships from
>> > > > the class attribute.
>> > > > 2. *We inspect each class in turn to see if there is an applicable
>> > > > method.*
>> > > > 3. If we find an applicable method we make sure that it is identical to
>> > > > any method determined for prior arguments. If it is identical, we proceed,
>> > > > otherwise we immediately drop through to the default code.
>> > > >
>> > > > It's not clear what #2 means -- an applicable method *for what*? Glancing
>> > > > at the source code would suggest it's looking for rbind.IDate:
>> > > >
>> > > > https://github.com/wch/r-source/blob/trunk/src/main/bind.c#L1051-L1063
>> > > >
>> > > > const char *generic = ((PRIMVAL(op) == 1) ? "cbind" : "rbind"); // should
>> > > > be rbind here
>> > > > const char *s = translateChar(STRING_ELT(classlist, i)); // iterating over
>> > > > the classes, should get to IDate first
>> > > > sprintf(buf, "%s.%s", generic, s); // should be rbind.IDate
>> > > >
>> > > > but adding this method (or even exporting it) is no help [ simply defining
>> > > > rbind.IDate = function(...) as.IDate(NextMethod()) ]
>> > > >
>> > > > Lastly, it appears that as.Date.IDate is called, which is causing the type
>> > > > conversion:
>> > > >
>> > > > debug(data.table:::as.Date.IDate)
>> > > > rbind(DF, DF) # launches debugger
>> > > > x
>> > > > # [1] "2019-05-26" <-- singleton, so apparently applied to DF$date, not
>> > > > c(DF$date, DF$date)
>> > > > undebug(data.table:::as.Date.IDate)
>> > > >
>> > > > I can't really wrap my head around why as.Date is being called here, and
>> > > > even allowing that, why the end result is still the original class [
>> > > > class(rbind(DF, DF)$date) == c('IDate', 'Date') ]
>> > > >
>> > > > So, I'm beginning to think this might be a bug. Am I missing something?
>> > > >
>> > >
>> > >         [[alternative HTML version deleted]]
>> > >
>> > > ______________________________________________
>> > > R-devel using r-project.org mailing list
>> > > https://stat.ethz.ch/mailman/listinfo/r-devel
>> >
>> >
>> >
>> > --
>> > Joshua Ulrich  |  about.me/joshuaulrich
>> > FOSS Trading  |  www.fosstrading.com
>> > R/Finance 2019 | www.rinfinance.com
>>
>>
>>
>> --
>> Joshua Ulrich  |  about.me/joshuaulrich
>> FOSS Trading  |  www.fosstrading.com
>> R/Finance 2019 | www.rinfinance.com



-- 
Joshua Ulrich  |  about.me/joshuaulrich
FOSS Trading  |  www.fosstrading.com
R/Finance 2019 | www.rinfinance.com



More information about the R-devel mailing list