[R] correction to the previously asked question (about merging factors)

Spencer Graves spencer.graves at pdf.com
Fri Feb 6 00:44:57 CET 2004


      Thanks, Peter. 

      So Sundar's more elegant solution is equivalent to my initial 
response to this question -- which shows how much one can lose trying to 
be too clever. 

      Best Wishes,
      spencer graves

Peter Dalgaard wrote:

>Spencer Graves <spencer.graves at pdf.com> writes:
>
>  
>
>>      Sundar:  Your solution is not only more elegant than mine, it's
>>also faster, at least with this tiny example: > start.time <-
>>proc.time()
>> > k1 <- length(F1)
>> > k2 <- length(F2)
>> > F12.lvls <- unique(c(levels(F1), levels(F2)))
>> > F. <- factor(rep(F12.lvls[1], k1+k1), levels=F12.lvls)
>> > F.[1:k1] <- F1
>> > F.[-(1:k1)] <- F2
>> > proc.time()-start.time
>>[1] 0.00 0.00 0.42   NA   NA
>> >
>> > start.time <- proc.time()
>> > F1 <- factor(c("b", "a"))
>> > F2 <- factor(c("c", "b"))
>> > F3 <- factor(c(levels(F1)[F1], levels(F2)[F2]))
>> > proc.time()-start.time
>>[1] 0.00 0.00 0.24   NA   NA
>> >
>>      With longer vectors, mine may be faster -- but yours is still
>>more elegant.     Best Wishes,
>>      spencer graves
>>    
>>
>
>Actually, Sundars solution is exactly equivalent to the 
>
>factor(c(as.character(F1),as.character(F2)))
>
>that several have suggested, and which may actually be good enough for
>the vast majority of cases. It is in fact the same thing that goes on
>inside rbind.data.frame (that uses as.vector, which is equivalent).
>
>If you really want something optimal, in the sense of not allocating a
>large amount of character strings and comparing them individually to
>a joint level set, I think you need something like this:
>
>l1 <- levels(F1)
>l2 <- levels(F2)
>ll <- sort(unique(c(l1, l2)))
>m1 <- match(l1, ll)
>m2 <- match(l2, ll)
>factor(c(m1[F1], m2[F2]), labels=ll)
>
>or if you want to be really hardcore, bypass the inefficiencies inside
>factor() and do
>
>structure(c(m1[F1], m2[F2]), levels=ll, class="factor")
>
>(People have been known to regret coding with explicit calls to
>structure(), though...)
>
>  
>




More information about the R-help mailing list