[R] Anomaly in sample() function

Wolfgang Polasek wolfgang.polasek at gmail.com
Mon Jul 13 15:04:07 CEST 2009


Hi all

Maybe someone knows a way to solve this anomaly in sample():
I like to compute a sample (n=100) with replications from a population  of
2500 units but if I draw repeated samples from it I dont get what seems to
be a representative sample if I look at other partitions of the population.
Enclosed is the population g99 with 4 columns: (units, partition 1 (site),
partition 2 (type), weights);
and my R program.

The problem: Some categories from  partition 2 (type) which I use to check
for representativeness, deviates up to 20 percentage points from the
population.
As criterion I have computed the mean difference and the SD of the relative
frequencies between sample and pop. What mean deviation is to expect?

Thanks for any ideas,
W. Polasek

dimnames(g99)[[1]] =paste(g99[,1])
s1= g99[paste(sample(g99[,1], 100, F, g99[,4])),1:4]
dim(s1)
j2 =table(s1[,3])   #sample density
j2g= table(g99[,3]) #pop density
chisq.test(j2g,j2)

p2=100*j2g / sum(j2g) #rel. frequency in pop
pd=p2-100* j2/sum(j2)  #difference of rel. frequency between pop and sample
round(rbind(j2g, p2, pd),2)
sum(abs(pd));sd(pd) #look for the 'best' representative sample


More information about the R-help mailing list