[R] ggplot2 boxplot confusion

Chris Friedl cfriedalek at gmail.com
Wed Feb 27 15:07:42 CET 2008


Thanks Thierry.

But this leads to a couple more questions if you don't mind.

1. I tried to extend your example to a grid by the facet_grid command with
the aim of getting a boxplot of VALUE according to two factors SERIES and
ID. However whatever syntax I use give me an error. For example:

ggplot(mydata, aes(y = VALUE, x = factor(1))) + geom_boxplot() +
scale_x_discrete("") +facet_grid(SERIES ~ ID)

Error: position_dodge requires the following missing aesthetics: x

I tried x=c(SERIES, ID) etc etc but they failed. 

Yet I know I can get a grid of density plots with qplot as follows:

ggplot(mydata, aes(x = VALUE, y = ..density..)) + geom_density() +
facet_grid(ID ~ SERIES)

Yet it doesn't work if I say geom_boxplot. 

I hope you can help me understand where I've gone wrong.

2. On your point about overlaying box and density plots, I'm not sure I
understand. I thought a a boxplot is just a particular view of a density
function, showing median, interquartile range etc. The "vertical" scale is
the same as the density functions "horizontal" scale, isn't it? For example
in the dummy dataset above:

summary(mydata$VALUE)
    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
-2.54400 -0.64690  0.07417  0.08289  0.77830  2.75900 

and

ggplot(mydata, aes(x = VALUE, y = ..density..)) + geom_density() shows a
density plot that shows features on the x-axis that are visually close to
the summary features.

My intent was to plot density because the box plot doesn't reveal shape
details such as multiple modes, and to augment with a narrow boxplot to show
some density features such as the position of the median, IQR etc. 

Or perhaps I've completely misunderstood your point (highly likely I think).

Thanks again for your help. Much appreciated.





ONKELINX, Thierry wrote:
> 
> Chris,
> 
> 1.
> 
> This code will give you the boxplot that you want. 
> 
> library(ggplot2)
> series <- c('C2','C4','C8','C10','C15','C20')
> ids <- c('ID1','ID2','ID3')
> mydata <-
> data.frame(SERIES=rep(series,30),ID=rep(ids,60),VALUE=rnorm(180))
> ggplot(mydata, aes(y = VALUE, x = factor(1))) + geom_boxplot() +
> scale_x_discrete("")
> 
> But the real power of ggplot2 is when you want a boxplot for each
> category:
> 
> ggplot(mydata, aes(y = VALUE, x = series)) + geom_boxplot()
> 
> 
> 2.
> Overlaying boxplots and density plots seems a bad idea to me as both
> plots are likey to have a different scale.
> 
> HTH,
> 
> Thierry
> 
> 

-- 
View this message in context: http://www.nabble.com/ggplot2-boxplot-confusion-tp15706116p15713934.html
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list