[R] Neat way of using R for pivoting?

Tue Sep 20 18:46:56 CEST 2005

>>> "BANNISTER, Keith" <keith.bannister at astrium.eads.net> 09/20/05
09:46AM >>>
>> 
>> Hi,
>> 
>> I'd like to use R to do what excel pivot tables do, and plot
results.

R does not have pivot tables and I hope that it never does.

My experiance with pivot tables is that they encourage poor initial
design followed
by non-easily-reproducable post-hoc twiddling.

R encourages proper initial design followed by fixing the core design
in cases
where things don't turn out the way you intended. 

In R I prefer to work with script files and save the file.  If the
table or graph
does not turn out the way I intended, then I just edit the script file
and rerun it.
While this may be a little more work than clicking on a pivot table at
first, in the 
long run I find it saves more time.

Consider the situation where you create a table/graph, then a month
later your
boss/client/coworker finds some typos in the original data and needs
the table
and/or graph recreated with the corrected data (or maybe a new dataset
that
needs a similar graph/table).  With the pivot table you need to try and
remember
everything that you clicked on and click on it again.  With the R
script file you 
just fix the data (or load in the new data) and rerun the script and
your done.

OK, enough of my ranting, on to helping with your problem.

>> I've never used R before, and I've managed to do something, but it's
quite a
>> lot of code to do something simple. I can't help but think I'm not
"Doing it
>> the R way".
>> 
>> I could be using R for the wrong thing, in which case, please tell
me off.
[snip]

"by" is a bit of an overkill for this situation, tapply will probably
work better.

try this basic script as a starting place:

### start ###
my.df <- data.frame( SNR=rep( c(4,6,8), each=3), 
	timeError = c(1.3,2.1,1.2,2.1,2.2,2.1,3.2,3.7,3.1))

tmp.mean <- tapply( my.df$timeError, my.df$SNR, mean)
tmp.sd   <- tapply( my.df$timeError, my.df$SNR, sd)

tmp.x <- unique(my.df$SNR)

plot( tmp.x, tmp.mean,
ylim=range(tmp.mean+3*tmp.sd,tmp.mean-3*tmp.sd),
	xlab='SNR',ylab='timeError')

segments(tmp.x, tmp.mean-3*tmp.sd, tmp.x, tmp.mean+3*tmp.sd,
col='green')

### optional
points(tmp.x, tmp.mean+3*tmp.sd, pch='-',cex=3,col='green')
points(tmp.x, tmp.mean-3*tmp.sd, pch='-',cex=3,col='green')
points(tmp.x, tmp.mean)

### end script ###

This may be even simpler with a loaded package. a quick search shows
the following functions (package in parens) that may help:

plotCI(gplots)          Plot Error Bars and Confidence Intervals
errbar(Hmisc)           Plot Error Bars
xYplot(Hmisc)           xyplot and dotplot with Matrix Variables to
Plot Error Bars and Bands

plotCI(plotrix)         Plot confidence intervals/error bars

errbar(sfsmisc)         Scatter Plot with Error Bars
plotCI(sfsmisc)         Plot Confidence Intervals / Error Bars

>> Appreciate any helpful hints from the pros.
>> 

hope this helps,

>> Cheers!
>> 
>> p.s. We've been having rather a good time around the office recently
with
>> "International Talk Like a Pirate Day" (www.yarr.org.uk). R fits in
very
>> well: "I be usin' Arrrgghhhh for my post processin'".
>> 
>> 
>> Keith Bannister

Greg Snow, Ph.D.
Statistical Data Center, LDS Hospital
Intermountain Health Care
greg.snow at ihc.com
(801) 408-8111