[R] stratified sampling

Daniel Nordlund djnordlund at frontier.com
Sat Mar 8 00:31:31 CET 2014


> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
> On Behalf Of Kristi Glover
> Sent: Friday, March 07, 2014 11:56 AM
> To: R-help
> Subject: [R] stratified sampling
> 
> Hi R users,
> I have been struggling to select the equal number of samples from each
> strata. I have the data collected in different years in different regions
> with different sample size. Basically, I have two two conditions (year and
> region). I wanted to make smaple sample size for both year and region.
> I found that "strata.sampling' package can use if I had one condition, but
> I have two conditions . Is there any package from which I can put two
> conditions and select the rows randomly 999 times and put the mean value?
> 
> Your help would be really appreciated. I am spending so much time...
> 
> Here What I did for the example data
> raw=structure(list(watershed = structure(c(1L, 1L, 1L, 1L, 1L, 1L,
> 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("A", "B"), class = "factor"),
>     year = c(2001, 2001, 2002, 2002, 2002, 2002, 2002, 2001,
>     2001, 2001, 2002, 2002, 2002), sp1 = c(18.38, 29.1, 90.72,
>     16.12, 49.12, 20.81, 65.1, 1.87, 72.99, 93.45, 38.44, 67.13,
>     45.71), sp2 = c(46.46, 94, 86.87, 46.91, 21.41, 92.82, 87.75,
>     16.18, 18.16, 18.76, 19.26, 52.73, 49.09), sp3 = c(86.9,
>     62.82, 74.32, 75.49, 20.17, 58.84, 16.51, 44.14, 44.39, 32.36,
>     53.28, 67.42, 33.37)), .Names = c("watershed", "year", "sp1",
> "sp2", "sp3"), class = "data.frame", row.names = c(NA, -13L))
> 
>  require(sampling)
>   if (is.null(method)) method <- "srswor"
>   if (!method %in% c("srswor", "srswr"))
>     stop('method must be "srswor" or "srswr"')
>   temp <- data[order(data[[group]]), ]
>   ifelse(length(size) > 1,
>          size <- size,
>          ifelse(size < 1,
>                 size <- round(table(temp[group]) * size),
>                 size <- rep(size, times=length(table(temp[group])))))
>   strat = strata(temp, stratanames = names(temp[group]),
>                  size = size, method = method)
>   getdata(temp, strat)
> }
> 
> test1<-strata.sampling(raw, ("watershed"), 2)# select 2 rows by watershed
> 
> BUT, I wanted to use "year" too. ("watershed", "year"). When I added the
> "year", it did not work
> test1<-strata.sampling(raw, ("watershed", "year"), 2)# select 2 rows by
> watershed and year
> > test1<-strata.sampling(raw, ("watershed", "year"), 2)
> Error: unexpected ',' in "test1<-strata.sampling(raw, ("watershed","
> 
> Here I want to select rows using tow conditions ("watershed", "year") with
> 999 times and put mean value of sp1,sp2,sp3, using random sampling 999.
> here is the output I wanted
> output<-structure(list(watershed = structure(c(1L, 1L, 2L, 2L), .Label =
> c("A",
> "B"), class = "factor"), year = c(2001L, 2002L, 2001L, 2002L),
>     sp1 = structure(c(1L, 1L, 1L, 1L), .Label = "mean", class = "factor"),
>     sp2 = structure(c(1L, 1L, 1L, 1L), .Label = "mean", class = "factor"),
>     sp3 = structure(c(1L, 1L, 1L, 1L), .Label = "mean", class =
> "factor")), .Names = c("watershed",
> "year", "sp1", "sp2", "sp3"), class = "data.frame", row.names = c(NA,
> -4L))
> 
> Any suggestions?
> Thanks for your help.
> KG
> 
> 
> 
> 
> 
> 

There seems to be something missing from your post (your code doesn't run as is even for a single stratum variable.  But I might hazard a guess that when you want to pass multiple strata variables you need to pass them as a vector.

c('watershed','year')

and if you are passing multiple statum variables, you also need to pass a vector of desired sample sizes in the order that the strata appear in you data.  In your case that would be

size = c(2,2,2,2)



If this doesn't solve the problem, then write back to the list with an example that works with a single variable with your data.


Dan 

Daniel Nordlund
Bothell, WA USA




More information about the R-help mailing list