# [R] Question about ggplot2 and stat_smooth

Dennis Murphy djmuser at gmail.com
Tue Oct 4 19:52:23 CEST 2011

```Hi:

The smooth is not going to replicate the quantile estimates you get
from the 'boxplots'; the smooth is estimating a conditional mean using
loess, with confidence limits associated with uncertainty in the
estimate of the conditional mean function, which are almost certainly
going to be narrower than the corresponding quantiles of the data
distributions.  If you want to mimic the behavior in the 'boxplots', I
would save the information from them into a data frame with columns
for each quantile, assign variable names to the quantiles, melt the
corresponding data frame so that the quantile names become factor
levels (with whatever variable is used to distinguish the 'boxplots'
as the ID variable in melt()), and then use ggplot2 or lattice to plot
the corresponding sets of lines.

Here's an example:

library('plyr')
library('reshape')

# Toy data frame
dd <- data.frame(year = rep(2000:2008, each = 500), y = rnorm(4500))

# Function to compute quantiles and return a data frame
g <- function(d) {
qq <- as.data.frame(as.list(quantile(d\$y, c(.05, .25, .50, .75, .95))))
names(qq) <- paste('Q', c(5, 25, 50, 75, 95), sep = '')
qq   }

# Apply function to each year of data in dd:
qdf <- ddply(dd, .(year), g)
# melt to produce a factor variable whose levels are quantiles
qdfm <- melt(qdf, id = 'year')

# Use ggplot() to plot the boxplots and quantile lines:
ggplot() +
geom_boxplot(data = dd, aes(x = factor(year), y = y)) +
geom_line(data = qdfm, aes(x = factor(year), y = value,
group = variable, colour = variable),
size = 1) +
labs(x = 'Year', colour = 'Quantile')

The idea of superimposing the lines over the boxplots is to show that
the default method of quantile() corresponds to the quantile() method
used to generate boxplots in ggplot2.

Is that closer to what you're after? If you want, you can always use
geom_ribbon() to shade the areas between the lines and
scale_colour_manual() to manually specify the line colors. Using the
above example, here's one way, using the unmelted quantile data:

ggplot(qdf, aes(x = year, y = Q50)) +
geom_line(size = 2, color = 'navyblue') +
geom_ribbon(aes(ymin = Q25, ymax = Q75), fill = 'blue', alpha = 0.4) +
geom_ribbon(aes(ymin = Q5, ymax = Q25), fill = 'blue', alpha = 0.2) +
geom_ribbon(aes(ymin = Q75, ymax = Q95), fill = 'blue', alpha = 0.2) +
labs(x = 'Year', y = 'Y')

Dennis

On Tue, Oct 4, 2011 at 10:01 AM,  <Thomas.Adams at noaa.gov> wrote:
>
> Thanks for responding. No, not smoothed quantile regression. If you go here: http://www.erh.noaa.gov/mmefs/index.php and click on one of the colored squares, you can see we have 'boxplots'. What I want to express is the uncertainty as depicted in the example from my previous email where I can specify the limits calculated for the 'boxplots' using  5%, 25%,75%, 95% limits as we have with the 'boxplots'.
>
> Tom
>
> ----- Original Message -----
> Date: Tuesday, October 4, 2011 10:23 am
> Subject: Re: [R] Question about ggplot2 and stat_smooth
> Cc: R-help forum <r-help at r-project.org>
>
>
>> On Mon, Oct 3, 2011 at 12:24 PM, Thomas Adams <Thomas.Adams at noaa.gov>
>> wrote:
>> >  I'm interested in creating a graphic -like- this:
>> >
>> > c <- ggplot(mtcars, aes(qsec, wt))
>> > c + geom_point() + stat_smooth(fill="blue", colour="darkblue",
>> size=2, alpha
>> > = 0.2)
>> >
>> > but I need to show 2 sets of bands (with different shading) using
>> 5%, 25%,
>> > 75%, 95% limits that I specify and where the heavy blue line is the
>> median.
>> > I don't understand how to do this with ggplot2.
>>
>> Exactly what sort of limits do you want?  It sounds like maybe you are
>> looking for smoothed quantile regression.
>>
>>
>> --
>> Assistant Professor / Dobelman Family Junior Chair
>> Department of Statistics / Rice University
>>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help