# split(.) woes + "very cheap" fix

Martin Maechler Martin Maechler <maechler@stat.math.ethz.ch>
Tue, 6 Jan 1998 18:42:58 +0100

```>>>>> "PD" == Peter Dalgaard BSA <p.dalgaard@biostat.ku.dk> writes:

PD> Douglas Bates <bates@stat.wisc.edu> writes:
>> > tapply( Machines\$Machine, Machines\$Worker, table )
>> \$6
>> integer(0)

PD> Yuk! Apparently, the culprit is "split":
ok

>> f1<-gl(2,1,4)
>> f2<-gl(2,2,4)
>> dput(split(f1,f2))
PD> list(1 = factor(c(1, 2), levels=1:0), 2 = factor(c(1, 2), levels=1:0))

>> dput(f1)
PD> structure(factor(c(1, 2, 1, 2), levels=1:2), class = "factor", .Label = c("1", "2"))
>> dput(f2)
PD> structure(factor(c(1, 1, 2, 2), levels=1:2), class = "factor", .Label = c("1", "2"))

PD> I wonder if this is a side effect of the ...hmm... "undesirable
PD> feature" of as.numeric(factor)??

no, it's a different bug.  The same things happen in 0.50

A very silly workaround (which I am  NOT committing to the sources)
which seems to help is

split.default <- function(x,f) {
if(is.factor(x)) x <- codes(x)
.Internal(split(x,as.factor(f)))
}

At least it prevents the SEG.FAULT in the following:

> split.default <- function(x,f) {
+  if(is.factor(x)) x <- codes(x)
+  .Internal(split(x,as.factor(f)))
+ }
> f1<-gl(2,1,4) ; f2<-gl(2,2,4);  dput(split(f1,f2))
list(1 = c(1, 2), 2 = c(1, 2))
> tapply(f1,f2,table)
\$1
1 2
1 1

\$2
1 2
1 1

> rm(split.default)
> tapply(f1,f2,table)

Process R bus error at Tue Jan  6 18:41:04 1998

--------------

-- Martin
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

```