[R] strange classification behaviour

Gabor Grothendieck ggrothendieck at gmail.com
Fri Nov 11 07:30:01 CET 2005


You could use cut.  The key calculation would be:

   w <- .05; eps <- 1e-5
   breakpoints <- seq(min(kk), max(kk), .05)
   breakpoints <- floor( (breakpoints + (w/2) + eps) / w) * w
   values <- cut(kk, c(breakpoints, Inf), right = FALSE)
   values <- ordered(values)

If you don't like the labels produced add lab = breakpoints as a cut arg.

On 11/10/05, RenE J.V. Bertin <rjvbertin at gmail.com> wrote:
> Hello,
>
> I've written a routine that takes an input vector and returns a 'binned' version with a requested bin width and converted to an ordered factor by default. It also attempts to make sure that all factor levels intermediate to the input range are present.
>
> This is the code as I currently have it:
>
> Classify <- function( values, ClassWidth=0.05, ordered.factor=TRUE, all=TRUE )
> {
>     valuesName <- deparse(substitute(values))
>     if( is.numeric(values) ){
>          values <- floor( (values+ (ClassWidth/2) ) / ClassWidth ) * ClassWidth
>          # determine the numerical range of the input
>          levels <- range( values, finite=TRUE )
>          if( ordered.factor ){
>               if( all ){
>                    # if we want all levels, construct a levels vector that can be passed to factor's levels argument:
>                    levels <- seq( levels[1], levels[2], by=ClassWidth )
>                    values <- factor(values, levels=levels, ordered=TRUE )
>               }
>               else{
>                    values <- factor(values, ordered=TRUE )
>               }
>          }
>     }
>     else{
>          levels <- range( values, finite=TRUE )
>          if( all ){
>               levels <- seq( levels[1], levels[2], by=ClassWidth )
>               values <- factor( values, levels=levels, ordered=ordered.factor )
>          }
>          else{
>               values <- factor( values, ordered=ordered.factor )
>          }
>     }
>     comment(values) <- paste( comment(values),
>          "; Classify(", valuesName, ", ClassWidth=", ClassWidth, ", ordered.factor=", ordered.factor, ")",
>          sep="")
>     values
> }
>
> This does work, but has some strange side-effects that I think might be due to rounding errors:
>
> ##> kk<-c(  0.854189  0.374423  0.522893  0.670796  0.913540  0.979011  0.510378  0.320440 -0.576764  0.940343 )
>
> ##> Classify( kk, ClassWidth=0.05, all=FALSE )
>  [1] 0.85 0.35 0.5  0.65 0.9  1    0.5  0.3  -0.6 0.95
> Levels: -0.6 < 0.3 < 0.35 < 0.5 < 0.65 < 0.85 < 0.9 < 0.95 < 1
> ### result as expected, but using this on the hor. axis of a graph can be ... surprising.
>
> ##> Classify( kk, ClassWidth=0.05, all=TRUE )
>  [1] 0.85 <NA> 0.5  <NA> <NA> 1    0.5  <NA> -0.6 <NA>
> 33 Levels: -0.6 < -0.55 < -0.5 < -0.45 < -0.4 < -0.35 < -0.3 < -0.25 < -0.2 < -0.15 < -0.1 < -0.05 < 0 < ... < 1
> ##> summary( Classify( kk, ClassWidth=0.05, all=TRUE ) )
>              -0.6              -0.55               -0.5              -0.45               -0.4              -0.35
>                 1                  0                  0                  0                  0                  0
>              -0.3              -0.25               -0.2              -0.15               -0.1              -0.05
>                 0                  0                  0                  0                  0                  0
>                 0 0.0499999999999999                0.1               0.15                0.2               0.25
>                 0                  0                  0                  0                  0                  0
>               0.3               0.35                0.4               0.45                0.5               0.55
>                 0                  0                  0                  0                  2                  0
>               0.6               0.65                0.7               0.75                0.8               0.85
>                 0                  0                  0                  0                  0                  1
>               0.9               0.95                  1               NA's
>                 0                  0                  1                  5
>
> ### ???
>
> What happens is probably that the value in my input that classify to 0.3 or 0.35 are not found in the list of levels that I calculate due to rounding errors. Adding an element 0.05 to kk supports this idea.
>
> Is there a way around this, for instance a more robust way to do what I'm trying to do here (or a function provided by R)?
>
> When I modify the relevant code above to
>
>                    levels <- floor( (seq( levels[1], levels[2], by=ClassWidth ) + (ClassWidth/2)) / ClassWidth ) * ClassWidth
>                    values <- factor( values, levels=levels, ordered=TRUE )
>
> the result is as expected, but I find that not very elegant...
>
> Thanks in advance,
> RenE Bertin
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>




More information about the R-help mailing list