[R] brewing stats

paul sorenson (sosman) sourceforge at metrak.com
Sun Oct 23 13:52:21 CEST 2005


I guess this isn't so much of a help request as a show-and-tell from a 
non-statistician homebrewer who has been fumbling around with R.  If 
nothing else it provides yet another data set.  I hope it is not out of 
line.

Anyway, the plots I have produced are at

	http://brewiki.org/BatchSparge#poll

The polling method is somewhat simple, its just one of those multiple 
choice style polls you can create on various web forums.

The poll was prompted by the ongoing claim from fly spargers that 
"their" method is more efficient, but I had never seen data to support 
that.  I thought maybe it was a bit of snobbery.

Maybe they are right.  However if I conveniently ignore that annoying 
bump on the left of the batch sparge histogram then the two groups start 
to look very similar.

I was going to go out on a limb and say I learn heaps from reading the 
posts here so please don't ruin my delusion too much if my output 
violates all principles of good statistics. OTOH if you can suggest 
other cool looking graphs please feel free.  The more difficult to 
pronounce the names are, the better :-)

The data set is (efficiency is the low end of its bin):

method	efficiency	count	source
fly	95	0	bb
fly	90	0	bb
fly	85	2	bb
fly	80	8	bb
fly	75	13	bb
fly	70	8	bb
fly	65	3	bb
fly	60	0	bb
fly	55	0	bb
batch	95	0	bb
batch	90	0	bb
batch	85	4	bb
batch	80	3	bb
batch	75	15	bb
batch	70	10	bb
batch	65	6	bb
batch	60	7	bb
batch	55	1	bb

And the R code:

# Crunch some stuff with brewboard (and similar polls).
# Data is already tabulated.

x = read.csv("SpargeEff.csv")

# Shift value to centre of bin
x$efficiency = x$efficiency + 2.5
# Ignore rows with no votes (NA), zeros are ok though
y = x[which(!is.na(x$count)),]
r = rep(row.names(y), y$count)
z = y[r,]
z$count = 1

par(mfrow=c(2,2))
barplot(table(z$method), main="number of responses")
barplot(table(z$method, z$efficiency), beside=T, legend=T, main="Mash 
efficiency by method", sub="paul sorenson 2005 brewiki.org")
boxplot(z$efficiency ~ z$method, main="Mash efficiency")
z.h = hist(z$efficiency, prob=T, main="Efficiencies,\n all methods 
combined", xlab="efficiency")
z.md = max(z.h$density)
lines(density(z$efficiency, bw=3.0), col='blue')
#qqnorm(x$efficiency)
t.test(efficiency ~ method, data=z)
#by(z, z$method, summary)
zs = split(z, z$method)
summary(zs$batch)
summary(zs$fly)

# fit a normal distribution
require(MASS)
z.fit = fitdistr(z$efficiency, 'normal')
q = 55:95 + 2.5
lines(q, dnorm(q, z.fit$estimate['mean'], z.fit$estimate['sd']), col='red')
legend('topleft', legend=c('density', 'fitted'), col=c('blue', 'red'), 
lwd=1, inset=0.05)

# Factor out lowball values.
z.f = z[which(z$efficiency >= 65),]
by(z.f, z.f$method, summary)




More information about the R-help mailing list