[R] what does cut(data, breaks=n) actually do?

Domenico Vistocco vistocco at unicas.it
Thu Dec 13 10:17:20 CET 2007


cut(data, breaks=n)
splits the data in n bins of (approximately) the same size.

The used size is obtained by:
max(data) - min(data)
------------------------------------
                 n

 > x=rnorm(x)
 > cut(x,breaks=3)
 [1] (1.79,9.97]  (-6.39,1.79] (9.97,18.2]  (9.97,18.2]  (-6.39,1.79]
 [6] (1.79,9.97]  (-6.39,1.79] (1.79,9.97]  (-6.39,1.79] (-6.39,1.79]
Levels: (-6.39,1.79] (1.79,9.97] (9.97,18.2]

Then you have:
 > 18.2-9.97
[1] 8.23
 > 9.97-1.79
[1] 8.18
 > 1.79+6.39
[1] 8.18
 >

 > (max(x)-min(x))/3
[1] 8.164187

I don't know the reasons for the little differences (I am wondering about).
I hope it is useful.
domenico

melissa cline wrote:
> Hello,
>
> I'm trying to bin a quantity into 2-3 bins for calculating entropy and
> mutual information.  One of the approaches I'm exploring is the cut()
> function, which is what the mutualInfo function in binDist uses.  When it's
> called in the format cut(data, breaks=n), it somehow splits the data into n
> distinct bins.  Can anyone tell me how cut() decides where to cut?
>
> Thanks,
>
> Melissa
>
>
>
> ---------------------------------------------------------------
> Melissa Cline, Independent Investigator
> MCD Biology, UCSC
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>



More information about the R-help mailing list