[R] How do I do a pretty scatter plot using ggplot2?

Sat Mar 10 03:51:51 CET 2012

On Fri, Mar 9, 2012 at 9:28 PM, Michael <comtech.usa at gmail.com> wrote:
> Thanks a lot Mike!
>

Michael if you don't mind. (Though admittedly it leads to some degree
of confusion in a conversation like this)

> Could you please explain your code a bit?

Which part?

>
> My imagination is that for each bin, I am plotting a line which is the
> quantile of the y-values in that bin?

Oh, so you want a qqnorm()-esque line? How is that like a scatterplot?

....yes, that's something else entirely (and not clear from your first
post -- to my ear the "quantile" is a statistic tied to the [e]cdf)
This is actually much easier in ggplot (and certainly doable in base
as well)

Try this,

DAT <- data.frame(x = runif(1000, 0, 20), y = rnorm(1000)) # Not so
volatile this time
DAT$xbin <- with(DAT, cut(x, seq(0, 20, 5)))

library(ggplot2)
p <- ggplot(DAT) + facet_wrap( ~ xbin) + stat_qq(aes(sample = y))

print(p)

If this isn't what you want, please spend some time to show an example
of the sort of graph you desire (it can be a bit of code or a link to
a picture or even a hand sketch hosted somewhere online)

Out on a limb, I think you might really be thinking of something more
like this:

p <- ggplot(DAT) + facet_wrap( ~ xbin) + geom_step(aes(x =
seq_along(y), y = sort(y)))

and see this for more: http://had.co.nz/ggplot2/geom_step.html

Michael Weylandt

>
> I ran your program but couldn't figure out the meaning of the dots in your
> plot?
>
> Thanks again!
>
> On Fri, Mar 9, 2012 at 7:07 PM, R. Michael Weylandt
> <michael.weylandt at gmail.com> wrote:
>>
>> That doesn't really seem to make sense to me as a graphical
>> representation (transforming adjacent y values differently), but if
>> you really want to do so, here's what I'd do if I understand your goal
>> (the preprocessing is independent of the graphics engine):
>>
>> DAT <- data.frame(x = runif(1000, 0, 20), y = rcauchy(1000)^2) # Nice
>> and volatile!
>>
>> # split y based on some x binning and assign empirical quantiles of each
>> group
>>
>> DAT$yquant <- with(DAT, ave(y, cut(x, seq(0, 20, 5)), FUN =
>> function(x) ecdf(x)(x)))
>>
>> # BASE
>> plot(yquant ~ x, data = DAT)
>>
>>  # ggplot2
>> library(ggplot2)
>>
>> p <- ggplot(DAT, aes(x = x, y = yquant)) + geom_point()
>> print(p)
>>
>> Michael Weylandt
>>
>> PS -- I see Josh Wiley just responded pointing out your requirements
>> #1 and #2 are incompatible: I've used 1 here.
>>
>> On Fri, Mar 9, 2012 at 7:37 PM, Michael <comtech.usa at gmail.com> wrote:
>> > Hi all,
>> >
>> > I am trying hard to do the following and have already spent a few hours
>> > in
>> > vain:
>> >
>> > I wanted to do the scatter plot.
>> >
>> > But given the high dispersion on those dots, I would like to bin the
>> > x-axis
>> > and then for each bin of the x-axis, plot the quantiles of the y-values
>> > of
>> > the data points in each bin:
>> >
>> > 1. Uniform bin size on the x-axis;
>> > 2. Equal number of observations in each bin;
>> >
>> > How to do that in R? I guess for the sake of prettyness, I'd better do
>> > it
>> > in ggplot2?
>> >
>> > Thank you!
>> >
>> >        [[alternative HTML version deleted]]
>> >
>> > ______________________________________________
>> > R-help at r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> > http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>
>