[R] How do I do a pretty scatter plot using ggplot2?

R. Michael Weylandt michael.weylandt at gmail.com
Sat Mar 10 04:21:55 CET 2012


Could you just add a log scale to the y dimension?

DAT <- data.frame(x = runif(1000, 0, 20), y = rnorm(1000))

plot(y ~ x, data = DAT, log = "y")

That lessens large dispersion (in some circumstances) but I'm not
really sure what that has to do with smoothing....do you mean
"smoothing" in the technical sense (loess, splines, and friends) or in
some graphical sense?

Still not sure what this has to do with quantile plots: they are
usually diagnostic tools for examining distributional shape/fit.

Here's two (related) ideas:

i) If you have categorical x data, boxplots:
http://had.co.nz/ggplot2/geom_boxplot.html

ii) If you have continuous x data, quantile "envelopes":
http://had.co.nz/ggplot2/stat_quantile.html

# In ggplot2

DAT <- data.frame(x = runif(1000, 0, 20), y = rnorm(1000))
DAT$xbin <- with(DAT, cut(x, seq(0, 20, 2)))

p <- ggplot(DAT, aes(x = x, y = y)) + geom_point(alpha = 0.2) +
stat_quantile(aes(colour = ..quantile..), quantiles = seq(0.05, 0.95,
by=0.05)) + facet_wrap(~ xbin, scales = "free")
print(p)

What I'd really do, if you had lots of data, would be to bin x into
small contiguous bins and to calculate quantiles for each of those
bins and to plot smoothers across the quantiles (using bin medians as
the x axis) -- I'm sure that's doable in ggplot2 as well.

Michael

On Fri, Mar 9, 2012 at 10:00 PM, Michael <comtech.usa at gmail.com> wrote:
> The origin of this problem was that a plain scatter plot with too many
> points with high dispersion generated too many points flying all over
> places.
>
> We are trying to smooth the charts a bit...
>
> Any good recommendations?
>
> Thanks a lot!
>
> On Fri, Mar 9, 2012 at 8:59 PM, Michael <comtech.usa at gmail.com> wrote:
>>
>> Sorry for the confusion Michael.
>>
>> I myself am trying to figure out what my boss is requesting:
>>
>> I am certain that I need to "plot the quantiles of each bin.  " ...
>>
>> But how are the quantiles plotted? Shall I specify 50% quantile, etc?
>>
>> Being a diligent guy I am trying my hard to do some homework and figure it
>> out myself...
>>
>> I thought there is a standard statistical prodedure that everybody
>> knows...
>>
>> Any more thoughts?
>>
>> Thanks a lot!
>>
>>
>> On Fri, Mar 9, 2012 at 8:51 PM, R. Michael Weylandt
>> <michael.weylandt at gmail.com> wrote:
>>>
>>> On Fri, Mar 9, 2012 at 9:28 PM, Michael <comtech.usa at gmail.com> wrote:
>>> > Thanks a lot Mike!
>>> >
>>>
>>> Michael if you don't mind. (Though admittedly it leads to some degree
>>> of confusion in a conversation like this)
>>>
>>> > Could you please explain your code a bit?
>>>
>>> Which part?
>>>
>>> >
>>> > My imagination is that for each bin, I am plotting a line which is the
>>> > quantile of the y-values in that bin?
>>>
>>> Oh, so you want a qqnorm()-esque line? How is that like a scatterplot?
>>>
>>> ....yes, that's something else entirely (and not clear from your first
>>> post -- to my ear the "quantile" is a statistic tied to the [e]cdf)
>>> This is actually much easier in ggplot (and certainly doable in base
>>> as well)
>>>
>>> Try this,
>>>
>>> DAT <- data.frame(x = runif(1000, 0, 20), y = rnorm(1000)) # Not so
>>> volatile this time
>>> DAT$xbin <- with(DAT, cut(x, seq(0, 20, 5)))
>>>
>>> library(ggplot2)
>>> p <- ggplot(DAT) + facet_wrap( ~ xbin) + stat_qq(aes(sample = y))
>>>
>>> print(p)
>>>
>>> If this isn't what you want, please spend some time to show an example
>>> of the sort of graph you desire (it can be a bit of code or a link to
>>> a picture or even a hand sketch hosted somewhere online)
>>>
>>> Out on a limb, I think you might really be thinking of something more
>>> like this:
>>>
>>> p <- ggplot(DAT) + facet_wrap( ~ xbin) + geom_step(aes(x =
>>> seq_along(y), y = sort(y)))
>>>
>>> and see this for more: http://had.co.nz/ggplot2/geom_step.html
>>>
>>> Michael Weylandt
>>>
>>> >
>>> > I ran your program but couldn't figure out the meaning of the dots in
>>> > your
>>> > plot?
>>> >
>>> > Thanks again!
>>> >
>>> > On Fri, Mar 9, 2012 at 7:07 PM, R. Michael Weylandt
>>> > <michael.weylandt at gmail.com> wrote:
>>> >>
>>> >> That doesn't really seem to make sense to me as a graphical
>>> >> representation (transforming adjacent y values differently), but if
>>> >> you really want to do so, here's what I'd do if I understand your goal
>>> >> (the preprocessing is independent of the graphics engine):
>>> >>
>>> >> DAT <- data.frame(x = runif(1000, 0, 20), y = rcauchy(1000)^2) # Nice
>>> >> and volatile!
>>> >>
>>> >> # split y based on some x binning and assign empirical quantiles of
>>> >> each
>>> >> group
>>> >>
>>> >> DAT$yquant <- with(DAT, ave(y, cut(x, seq(0, 20, 5)), FUN =
>>> >> function(x) ecdf(x)(x)))
>>> >>
>>> >> # BASE
>>> >> plot(yquant ~ x, data = DAT)
>>> >>
>>> >>  # ggplot2
>>> >> library(ggplot2)
>>> >>
>>> >> p <- ggplot(DAT, aes(x = x, y = yquant)) + geom_point()
>>> >> print(p)
>>> >>
>>> >> Michael Weylandt
>>> >>
>>> >> PS -- I see Josh Wiley just responded pointing out your requirements
>>> >> #1 and #2 are incompatible: I've used 1 here.
>>> >>
>>> >> On Fri, Mar 9, 2012 at 7:37 PM, Michael <comtech.usa at gmail.com> wrote:
>>> >> > Hi all,
>>> >> >
>>> >> > I am trying hard to do the following and have already spent a few
>>> >> > hours
>>> >> > in
>>> >> > vain:
>>> >> >
>>> >> > I wanted to do the scatter plot.
>>> >> >
>>> >> > But given the high dispersion on those dots, I would like to bin the
>>> >> > x-axis
>>> >> > and then for each bin of the x-axis, plot the quantiles of the
>>> >> > y-values
>>> >> > of
>>> >> > the data points in each bin:
>>> >> >
>>> >> > 1. Uniform bin size on the x-axis;
>>> >> > 2. Equal number of observations in each bin;
>>> >> >
>>> >> > How to do that in R? I guess for the sake of prettyness, I'd better
>>> >> > do
>>> >> > it
>>> >> > in ggplot2?
>>> >> >
>>> >> > Thank you!
>>> >> >
>>> >> >        [[alternative HTML version deleted]]
>>> >> >
>>> >> > ______________________________________________
>>> >> > R-help at r-project.org mailing list
>>> >> > https://stat.ethz.ch/mailman/listinfo/r-help
>>> >> > PLEASE do read the posting guide
>>> >> > http://www.R-project.org/posting-guide.html
>>> >> > and provide commented, minimal, self-contained, reproducible code.
>>> >
>>> >
>>
>>
>



More information about the R-help mailing list