[R] Violin plot of categorical/binned data

Jim Lemon jim at bitwrit.com.au
Sun Nov 4 01:47:55 CET 2012


On 11/04/2012 06:27 AM, Nathan Miller wrote:
> Hi,
>
> I'm trying to create a plot showing the density distribution of some
> shipping data. I like the look of violin plots, but my data is not
> continuous but rather binned and I want to make sure its binned nature (not
> smooth) is apparent in the final plot. So for example, I have the number of
> individuals per vessel, but rather than having the actual number of
> individuals I have data in the format of: 7 values of zero, 11 values
> between 1-10, 6 values between 10-100, 13 values between 100-1000, etc. To
> plot this data I generated a new dataset with the first 7 values being 0,
> representing the 7 values of 0, the next 11 values being 5.5, representing
> the 11 values between 1-10, etc. Sample data below.
>
> I can make a violin plot (code below) using a log y-axis, which looks
> alright (though I do have to deal with the zeros still), but in its default
> format it hides the fact that these are binned data, which seems a bit
> misleading. Is it possible to make a violin plot that looks a bit more
> angular (more corners, less smoothing) or in someway shows the
> distribution, but also clearly shows the true nature of these data? I've
> tried playing with the bandwidth adjustment and the kernel but haven't been
> able to get a figure that seems to work.
>
> Anyone have some thoughts on this?
>
Hi Nate,
I'm not exactly sure what you are doing in the data transformation, but 
you can display this type of information as a single polygon for each 
instance (kiteChart) or separate rectangles (battleship.plot).

library(plotrix)
vessels<-matrix(c(zero=sample(1:10,5),one2ten=sample(5:20,5),
  ten2hundred=sample(15:36,5),hundred2thousand=sample(10:16,5)),
  ncol=4)
battleship.plot(vessels,xlab="Number of passengers",
  yaxlab=c("Barnacle","Maelstrom","Poopdeck","Seasick","Wallower"),
  xaxlab=c("0","1-10","10-100","100-1000"))
kiteChart(vessels,xlab="Number of passengers",ylab="Vessel",
  varlabels=c("Barnacle","Maelstrom","Poopdeck","Seasick","Wallower"),
  timelabels=c("0","1-10","10-100","100-1000"))

Jim




More information about the R-help mailing list