split(.) woes + "very cheap" fix

Martin Maechler Martin Maechler <maechler@stat.math.ethz.ch>
Tue, 6 Jan 1998 18:42:58 +0100


>>>>> "PD" == Peter Dalgaard BSA <p.dalgaard@biostat.ku.dk> writes:

    PD> Douglas Bates <bates@stat.wisc.edu> writes:
    >> > tapply( Machines$Machine, Machines$Worker, table )
    >> $6
    >> integer(0)

    PD> Yuk! Apparently, the culprit is "split":
ok

    >> f1<-gl(2,1,4)
    >> f2<-gl(2,2,4)
    >> dput(split(f1,f2))
    PD> list(1 = factor(c(1, 2), levels=1:0), 2 = factor(c(1, 2), levels=1:0))

    >> dput(f1)
    PD> structure(factor(c(1, 2, 1, 2), levels=1:2), class = "factor", .Label = c("1", "2"))
    >> dput(f2)
    PD> structure(factor(c(1, 1, 2, 2), levels=1:2), class = "factor", .Label = c("1", "2"))

    PD> I wonder if this is a side effect of the ...hmm... "undesirable
    PD> feature" of as.numeric(factor)??

no, it's a different bug.  The same things happen in 0.50


A very silly workaround (which I am  NOT committing to the sources) 
which seems to help is

  split.default <- function(x,f) {
	  if(is.factor(x)) x <- codes(x)
	  .Internal(split(x,as.factor(f)))
  }

At least it prevents the SEG.FAULT in the following:

    > split.default <- function(x,f) {
    +  if(is.factor(x)) x <- codes(x)
    +  .Internal(split(x,as.factor(f)))
    + }
    > f1<-gl(2,1,4) ; f2<-gl(2,2,4);  dput(split(f1,f2))
    list(1 = c(1, 2), 2 = c(1, 2))
    > tapply(f1,f2,table)
    $1
    1 2 
    1 1 

    $2
    1 2 
    1 1 

    > rm(split.default)
    > tapply(f1,f2,table)

    Process R bus error at Tue Jan  6 18:41:04 1998

--------------


-- Martin
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._