[R] survey weights in sample with replacement

Mehtabul Azam mazam at smu.edu
Wed Oct 31 17:51:14 CET 2007


Thanks Thomas ! I am trying to draw random sample from a household survey
which has 80,000 observations.
rural is name of the dataset, while iwt is survey weights assigned to each
observation.
the resulting error are :

> z=sample(rural,5000,replace=TRUE, Prob=rural$iwt)
Error in sample(rural, 5000, replace = TRUE, Prob = rural$iwt) : 
        unused argument(s) (Prob = c(133, 133, 166, 166, 166, 166, 1047,
1047, 1047, 1047, 288, 623, 623, 240, 240, 432, 144, 144, 719, 719, 316,
342, 342, 816, 816, 105, 158, 158, 1105, 1105, 101, 557, 557, 405, 405, 101,
304, 304, 1165, 1165, 193, 771, 771, 1060, 1060, 482, 530, 530, 2024, 2024,
254, 254, 241, 241, 241, 241, 674, 674, 674, 674, 137, 137, 623, 623, 623,
623, 603, 603, 603, 603, 285, 556, 556, 970, 970, 285, 728, 728, 499, 499,
272, 1349, 1349, 218, 218, 272, 1240, 1240, 95, 95, 307, 307, 307, 307, 307,

> iwt=rural[,"iwt"]

> z=sample(rural,5000,replace=TRUE, Prob=iwt)
Error in sample(rural, 5000, replace = TRUE, Prob = iwt) : 
        unused argument(s) (Prob = c(133, 133, 166, 166, 166, 166, 1047,
1047, 1047, 1047, 288, 623, 623, 240, 240, 432, 144, 144, 719, 719, 316,
342, 342, 816, 816, 105, 158, 158, 1105, 1105, 101, 557, 557, 405, 405, 101,
304, 304, 1165, 1165, 193, 771, 771, 1060, 1060, 482, 530, 530, 2024, 2024,
254, 254, 241, 241, 241, 241, 674, 674, 674, 674, 137, 137, 623, 623, 623,
623, 603, 603, 603, 603, 285, 556, 556, 970, 970, 285, 728, 728, 499, 499,
272, 1349, 1349, 218, 218, 272, 1240, 1240, 95, 95, 307, 307, 307, 307, 307,


> iwt=as.vector(rural[,"iwt"])
> z=sample(rural,5000,replace=TRUE, Prob=iwt)
Error in sample(rural, 5000, replace = TRUE, Prob = iwt) : 
        unused argument(s) (Prob = c(133, 133, 166, 166, 166, 166, 1047,
1047, 1047, 1047, 288, 623, 623, 240, 240, 432, 144, 144, 719, 719, 316,
342, 342, 816, 816, 105, 158, 158, 1105, 1105, 101, 557, 557, 405, 405, 101,
304, 304, 1165, 1165, 193, 771, 771, 1060, 1060, 482, 530, 530, 2024, 2024,
254, 254, 241, 241, 241, 241, 674, 674, 674, 674, 137, 137, 623, 623, 623,
623, 603, 603, 603, 603, 285, 556, 556, 970, 970, 285, 728, 728, 499, 499,
272, 1349, 1349, 218, 218, 272, 1240, 1240, 95, 95, 307, 307, 307, 307, 307,



summary(rural$iwt)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
      1     400    1078    1894    2981   54320 
> 
I just want that random sample look as close as possible to population (
weighted proportions generated from sample)
I thought it should automatically normalize probablity vector.I am not sure,
i am reading this right // I might be totally off the track.

Regards,
Mehtab
-----Original Message-----
From: Thomas Lumley [mailto:tlumley at u.washington.edu] 
Sent: Wednesday, October 31, 2007 9:44 AM
To: Azam, Mehtabul 
Cc: r-help at stat.math.ethz.ch
Subject: Re: [R] survey weights in sample with replacement

On Tue, 30 Oct 2007, Azam, Mehtabul  wrote:

>>> Hi,
>       I am trying to draw a random sample from an household survey with 
> sample weight. Is there any function in R or Splus which allows this.
>

It depends on exactly what you want.

The sample() function will draw unequal probability samples with 
replacement.

sample() will also draw samples without replacement, but (as documented) 
it uses sequential sampling and so does not actually generate 
probabilities proportional to the specified weights for sample sizes 
greater than 1.

The error in sequential sampling is pretty small, but it has attracted a 
lot of creativity in the survey literature (probably more than it 
deserves).  The 'sampling' package implements several algorithms for 
drawing unequal probability samples without replacement that really are 
proportional to the specified weights where this is achievable.

 	-thomas

Thomas Lumley			Assoc. Professor, Biostatistics
tlumley at u.washington.edu	University of Washington, Seattle



More information about the R-help mailing list