[R] stratified sampling

Bert Gunter gunter.berton at gene.com
Fri Mar 7 21:41:38 CET 2014


Why?

Presumably you want to bootstrap the distribution of the  mean -- but
why? Anyway, if this is correct, the boot package can do this for you.

-- Bert

Bert Gunter
Genentech Nonclinical Biostatistics
(650) 467-7374

"Data is not information. Information is not knowledge. And knowledge
is certainly not wisdom."
H. Gilbert Welch




On Fri, Mar 7, 2014 at 11:56 AM, Kristi Glover
<kristi.glover at hotmail.com> wrote:
> Hi R users,
> I have been struggling to select the equal number of samples from each strata. I have the data collected in different years in different regions with different sample size. Basically, I have two two conditions (year and region). I wanted to make smaple sample size for both year and region.
> I found that "strata.sampling' package can use if I had one condition, but I have two conditions . Is there any package from which I can put two conditions and select the rows randomly 999 times and put the mean value?
>
> Your help would be really appreciated. I am spending so much time...
>
> Here What I did for the example data
> raw=structure(list(watershed = structure(c(1L, 1L, 1L, 1L, 1L, 1L,
> 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("A", "B"), class = "factor"),
>     year = c(2001, 2001, 2002, 2002, 2002, 2002, 2002, 2001,
>     2001, 2001, 2002, 2002, 2002), sp1 = c(18.38, 29.1, 90.72,
>     16.12, 49.12, 20.81, 65.1, 1.87, 72.99, 93.45, 38.44, 67.13,
>     45.71), sp2 = c(46.46, 94, 86.87, 46.91, 21.41, 92.82, 87.75,
>     16.18, 18.16, 18.76, 19.26, 52.73, 49.09), sp3 = c(86.9,
>     62.82, 74.32, 75.49, 20.17, 58.84, 16.51, 44.14, 44.39, 32.36,
>     53.28, 67.42, 33.37)), .Names = c("watershed", "year", "sp1",
> "sp2", "sp3"), class = "data.frame", row.names = c(NA, -13L))
>
>  require(sampling)
>   if (is.null(method)) method <- "srswor"
>   if (!method %in% c("srswor", "srswr"))
>     stop('method must be "srswor" or "srswr"')
>   temp <- data[order(data[[group]]), ]
>   ifelse(length(size) > 1,
>          size <- size,
>          ifelse(size < 1,
>                 size <- round(table(temp[group]) * size),
>                 size <- rep(size, times=length(table(temp[group])))))
>   strat = strata(temp, stratanames = names(temp[group]),
>                  size = size, method = method)
>   getdata(temp, strat)
> }
>
> test1<-strata.sampling(raw, ("watershed"), 2)# select 2 rows by watershed
>
> BUT, I wanted to use "year" too. ("watershed", "year"). When I added the "year", it did not work
> test1<-strata.sampling(raw, ("watershed", "year"), 2)# select 2 rows by watershed and year
>> test1<-strata.sampling(raw, ("watershed", "year"), 2)
> Error: unexpected ',' in "test1<-strata.sampling(raw, ("watershed","
>
> Here I want to select rows using tow conditions ("watershed", "year") with 999 times and put mean value of sp1,sp2,sp3, using random sampling 999. here is the output I wanted
> output<-structure(list(watershed = structure(c(1L, 1L, 2L, 2L), .Label = c("A",
> "B"), class = "factor"), year = c(2001L, 2002L, 2001L, 2002L),
>     sp1 = structure(c(1L, 1L, 1L, 1L), .Label = "mean", class = "factor"),
>     sp2 = structure(c(1L, 1L, 1L, 1L), .Label = "mean", class = "factor"),
>     sp3 = structure(c(1L, 1L, 1L, 1L), .Label = "mean", class = "factor")), .Names = c("watershed",
> "year", "sp1", "sp2", "sp3"), class = "data.frame", row.names = c(NA,
> -4L))
>
> Any suggestions?
> Thanks for your help.
> KG
>
>
>
>
>
>
>
>
>
>
>
>
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list