[R] Boxplot

David Winsemius dwinsemius at comcast.net
Sun Nov 27 06:25:20 CET 2011

On Nov 27, 2011, at 12:15 AM, Jeffrey Joh wrote:

> I'm trying to do the second case among Jim's suggestions.  I used  
> Bert's suggestion and it works great.
> I would also like to ask if anyone is familiar with a package for  
> making box-plots.  I would like to bin my datapoints at defined X  
> intervals and display a boxplot for each bin on the same chart.

Combining `cut` (to define the intervals) and `boxplot` should be  
fairly straight-forward.

> In Stata, there is a tool for making these, and it varies the width  
> of the boxplot based on the number of points in each plot.

We have a tool for that, too. Study `quantile` a bit, to automatically  
pick cutpoints that will divide into approximately equal groups.

(I use the `cut2` function in the Hmisc package,  because it is  
integrated with `rms` that I use all the time, and because its  
defaults for cut()-ting are more to my liking. It also has a "g="  
parameter that automates the cut( ..., quantile(...)) processing.

> I am hoping there is a similar tool for R.
> Thank you,
> Jeffrey
> ----------------------------------------
>> Date: Tue, 22 Nov 2011 18:51:05 +1100
>> From: jim at bitwrit.com.au
>> To: johjeffrey at hotmail.com
>> CC: r-help at r-project.org
>> Subject: Re: [R] Binned line plot
>> On 11/22/2011 04:29 PM, Jeffrey Joh wrote:
>>> I have a scatter plot with 10000 points. I would like to add a  
>>> line that bins every 50 points and connects the average of each  
>>> bin. I'm looking for something similar to line type "m" in Stata.
>>> With this dataset of 10000 points, I would also like to bin the  
>>> data and make boxplots at certain intervals, so that I have a set  
>>> of boxplots to represent each bin. I would also like the width of  
>>> each box to be proportional to the number of points in each bin.
>>> How can I make these plots? Is there a simple package to use?
>> Hi Jeffrey,
>> There are three possibilities that come to mind:
>> 1) You want to bin the points based on their order in the data frame.
>> 2) You want to bin the points based on the x or y values of the  
>> coordinates.
>> 3) You want to bin the points based on the x _and_ y values of the
>> coordinates.
>> Number 1 is trivial and has already been answered (assume a two  
>> column
>> data frame of coordinates named "xypoints").
>> #first point - set up a loop to get a vector of averages
>> meanx<-rep(0,200)
>> meany<-rep(0,200)
>> for(index in 1:200) {
>> start<-1+50*(index-1)
>> meanx[index]<-mean(xypoints[start:(start+49),"x"])
>> meany[index]<-mean(xypoints[start:(start+49),"y"])
>> }
>> plot(meanx,meany,type="l")
>> Number 2 requires that you sort the pairs based on the value of the  
>> one
>> you want, then apply the same process as 1 to the sorted pairs.  
>> Number 3
>> is somewhat more difficult.
>> I don't do this much, and some of the people who do map analysis will
>> probably come up with a much better method.
>> Find the most extreme point.
>> Find the 49 points closest to that point to constitute group 1.
>> Remove those points from the data frame.
>> Go back to the first step if there are any points left.
>> You will end up with 200 groups of points that are spatially grouped.
>> Get the centroids and plot as above.
>> Another wild guess from
>> Jim

David Winsemius, MD
West Hartford, CT

More information about the R-help mailing list