# [R] what does cut(data, breaks=n) actually do?

Domenico Vistocco vistocco at unicas.it
Thu Dec 13 10:17:20 CET 2007

```cut(data, breaks=n)
splits the data in n bins of (approximately) the same size.

The used size is obtained by:
max(data) - min(data)
------------------------------------
n

> x=rnorm(x)
> cut(x,breaks=3)
[1] (1.79,9.97]  (-6.39,1.79] (9.97,18.2]  (9.97,18.2]  (-6.39,1.79]
[6] (1.79,9.97]  (-6.39,1.79] (1.79,9.97]  (-6.39,1.79] (-6.39,1.79]
Levels: (-6.39,1.79] (1.79,9.97] (9.97,18.2]

Then you have:
> 18.2-9.97
[1] 8.23
> 9.97-1.79
[1] 8.18
> 1.79+6.39
[1] 8.18
>

> (max(x)-min(x))/3
[1] 8.164187

I don't know the reasons for the little differences (I am wondering about).
I hope it is useful.
domenico

melissa cline wrote:
> Hello,
>
> I'm trying to bin a quantity into 2-3 bins for calculating entropy and
> mutual information.  One of the approaches I'm exploring is the cut()
> function, which is what the mutualInfo function in binDist uses.  When it's
> called in the format cut(data, breaks=n), it somehow splits the data into n
> distinct bins.  Can anyone tell me how cut() decides where to cut?
>
> Thanks,
>
> Melissa
>
>
>
> ---------------------------------------------------------------
> Melissa Cline, Independent Investigator
> MCD Biology, UCSC
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help