[R] scaling of different data sets in ggplot

Stephen Tucker brown_emu at yahoo.com
Tue Jul 17 11:00:28 CEST 2007


Hi Hadley,

That was also my initial thought as well, that maybe having different scales
on the same figure would obfuscate the structure and meaning of the data. But
I think in some instances (i.e., publications where page limits are imposed)
I think it's desirable to condense a lot of information onto a single plot
(for instance, if they show the same trend - even if they are not in the same
units), which means having more than one scale in the same plotting window. I
haven't checked what Tufte, Cleveland, and Wilkinson have to say about this,
but in practice I don't think it's all that uncommon.

I agree that log(z) is an operation on the data set, but representing it
graphically can be accomplished either through plotting log(z), or plotting z
on a log scale... in either case having an extra axis showing y and z [and
not log(z)] would be nice I would think.

I haven't tried it in lattice but in the traditional graphics system it is
quite straight-forward. Your claim says that ggplot takes 'tries to take the
good parts of base and lattice graphics and none of the bad parts' - just
trying to hold you to your word :).

Seriously though, I think the idea of ggplot (and implementation) is really
great. Currently R has many graphics systems, of which I know traditional and
lattice - and both are really fantastic (I plan to learn grid sometime in the
future) and I am fanatical about them. But for students and colleagues who
have less programming experience, I think the learning curve for lattice (to
gain proficiency, that is) may be a tad steep... I've been playing around
with ggplot to see if it would be a gentler introduction to conditioning
plots and analysis of multivariate datasets - which, in a way, I think it
could be - so I'm currently trying to test the limits of its flexibility.
It's true that there are some plotting concepts that are generally
discouraged, but it seems to me that the ultimate discretion should lie with
the user, and the plotting system should give him/her the freedom to choose
[to make a bad plot]. Even Lee Wilkinson says in his book that his grammar
will allow someone to make meaningless plots. One example that comes to mind
is the pie chart - I know they are heavily discouraged, but in some
communities, it's commonly used and therefore expected; to communicate to
that particular audience it's sometimes necessary to speak their language...

So, hope you don't mind, but I may ask some more 'can ggplot do this'
questions in the future. But keep up the good work,

Stephen


--- hadley wickham <h.wickham at gmail.com> wrote:

> Hi Stephen,
> 
> You can't do that in ggplot (have two different scales) because I
> think it's generally a really bad idea.  The whole point of plotting
> the data is so that you can use your visual abilities to gain insight
> into the data.  When you have two different scales the positions of
> the two groups are essentially arbitrary - the data only have x values
> in common, not y values.  You essentially have two almost unrelated
> graphs plotted on top of each other.
> 
> On the other hand, for this data, I think it would be reasonable to
> plot log(z) and y on the same scale - the data is transformed not the
> scales.
> 
> Hadley
> 
> On 7/14/07, Stephen Tucker <brown_emu at yahoo.com> wrote:
> > Dear list (but probably mostly Hadley):
> >
> > In ggplot, operations to modify 'guides' are accessed through grid
> > objects, but I did not find mention of creating new guides or possibly
> > removing them altogether using ggplot functions. I wonder if this is
> > something I need to learn grid to learn more about (which I hope to do
> > eventually).
> >
> > Also, ggplot()+geom_object() [where 'object' can be point, line, etc.]
> > or layer() contains specification for the data, mappings and
> > geoms/stats - but the geoms/stats can be scale-dependent [for
> > instance, log]. so I wonder how different scalings can be applied to
> > different data sets.
> >
> > Below is an example that requires both:
> >
> > x <- runif(100) y <- exp(x^2) z <- x^2+rnorm(100,0,0.02)
> >
> > par(mar=c(5,4,2,4)+0.1) plot(x,y,log="y") lines(lowess(x,y,f=1/3))
> > par(new=TRUE) plot(x,z,col=2,pch=3,yaxt="n",ylab="")
> > lines(lowess(x,z,f=1/3),col=2) axis(4,col=2,col.axis=2)
> > mtext("z",4,line=3,col=2)
> >
> > In ggplot:
> >
> > ## data specification
> > ggplot(data=data.frame(x,y,z)) +
> >
> >   ## first set of points geom_point(mapping=aes(x=x,y=y)) +
> >   ## scale_y_log() +
> >
> >   ## second set of points geom_point(mapping=aes(x=x,y=z),pch=3) +
> >   ## layer(mapping=aes(x=x,y=z),stat="smooth",method="loess") +
> >   ## scale_y_continuous()
> >
> > scale_y_log() and scale_y_continuous() appear to apply to both mappings
> at
> > once, and I can't figure out how to associate them with the intended ones
> (I
> > expect this will be a desire for size and color scales as well).
> >
> > Of course, I can always try to fool the system by (1) applying the
> scaling a
> > priori to create a new variable, (2) plotting points from the new
> variable,
> > and (3) creating a new axis with custom labels. Which then brings me back
> to
> > ...how to add new guides? :)
> >
> > Thanks,
> >
> > Stephen
> >
> >
> >
> >      
>
____________________________________________________________________________________
> >
> > ______________________________________________
> > R-help at stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>



More information about the R-help mailing list