[R] Sequential Naming of ggplot .pngs using plyr

Ista Zahn izahn at psych.rochester.edu
Wed Aug 10 23:42:21 CEST 2011


Hi Justin,

On Wed, Aug 10, 2011 at 5:04 PM, Justin Haynes <jtor14 at gmail.com> wrote:
> If I have data:
>
> dat<-data.frame(a=rnorm(20),b=rnorm(20),c=rnorm(20),d=rnorm(20),site=rep(letters[5:8],each=5))
>
> And want to plot like this:
>
> ctr<-1
> for(i in c('a','b','c','d')){
>    png(file=paste('/tmp/plot_number_',ctr,'.png',sep=''),height=8.5,
> width=11,units='in',pointsize=9,res=300)
>    print(ggplot(dat[,names(dat) %in%
> c('site',i)],aes(x=factor(site),y=dat[,i]))+geom_boxplot()+opts(title=paste('plot
> number',ctr,sep=' ')))
>    dev.off()
>    ctr<-ctr+1
> }
>
> Is there a way to do the same naming using plyr (or data.table or foreach
> which I am not familiar with at all!)?

This is not "the same naming", but the same general idea can be
achieved with plyr using

 d_ply(melt(dat,id.vars='site'),.(variable),function(df) {
png(file=paste("plyr_plot", unique(df$variable),
".png"),height=8.5,width=11,units='in',pointsize=9,res=300)
     print(ggplot(df,aes(x=factor(site),y=value))+geom_boxplot())
     dev.off()
     })

I'm not up to speed on .parallel, foreach etc., so I'l leave the rest
to someone else.

Best,
Ista
>
> m.dat<-melt(dat,id.vars='site')
> ddply(m.dat,.(variable),function(df)
> print(ggplot(df,aes(x=factor(site),y=value))+geom_boxplot()+ ..?)
>
> And better yet, is there a way to do it using .parallel=T?
>
> Faceting is not really an option (unless I can facet onto multiple pages of
> a pdf or something) because these need to go into reports as individually
> labelled and titled plots.
>
>
> As a bit of a corollary, is it really worth the headache to resolve this if
> I am only using melt/plyr to split on the four letter variables? With a
> larger set of data (1e6 rows), the melt/plyr version takes a significant
> amount of time but .parallel=T drops the time significantly.  Is the right
> answer a foreach loop and can I do that with the increasing counter? (I
> haven't gotten beyond Hadley's .parallel feature in my parallel R
> dealings.)
>
>>
> dat<-data.frame(a=rnorm(1e6),b=rnorm(1e6),c=rnorm(1e6),d=rnorm(1e6),site=rep(letters[5:8],each=2.5e5))
>> ctr<-1
>> system.time(for(i in c('a','b','c','d')){
> +     png(file=paste('/tmp/plot_number_',ctr,'.png',sep=''),height=8.5,
> width=11,units='in',pointsize=9,res=300)
> +     print(ggplot(dat[,names(dat) %in%
> c('site',i)],aes(x=factor(site),y=dat[,i]))+geom_boxplot()+opts(title=paste('plot
> number',ctr,sep=' ')))
> +     dev.off()
> +     ctr<-ctr+1
> + })
>   user  system elapsed
>  54.630   0.120  54.843
>
>> system.time(
> + ddply(melt(dat,id.vars='site'),.(variable),function(df) {
> +
> png(file='/tmp/plyr_plot.png',height=8.5,width=11,units='in',pointsize=9,res=300)
> +     print(ggplot(df,aes(x=factor(site),y=value))+geom_boxplot())
> +     dev.off()
> +     },.parallel=F)
> + )
>   user  system elapsed
>  58.40    0.13   58.63
>
>> system.time(
> + ddply(melt(dat,id.vars='site'),.(variable),function(df) {
> +
> png(file='/tmp/plyr_plot.png',height=8.5,width=11,units='in',pointsize=9,res=300)
> +     print(ggplot(df,aes(x=factor(site),y=value))+geom_boxplot())
> +     dev.off()
> +     },.parallel=T)
> + )
>   user  system elapsed
>  70.33    3.46   27.61
>>
>
> How might I speed this up and include the sequential plot names?
>
> Thanks a bunch!
>
> Justin
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Ista Zahn
Graduate student
University of Rochester
Department of Clinical and Social Psychology
http://yourpsyche.org



More information about the R-help mailing list