[R] Nested for loop

Ben Tupper btupper at bigelow.org
Mon Aug 7 16:50:56 CEST 2017


Hmmm.

If I understand you correctly, your question has to do with adding lines to your graph?  If so, my ggplot2 skills are sort of floppy, but you could append your sampling results to your data frame (one for each sample set) and then simply add layers.  Sort of like this.

N <- 10
x <- 1:N
df <- data.frame(
	x = x,
	y1 = sample(x, N, replace = TRUE),
	y2 = sample(x, N, replace = TRUE),
	y3 = sample(x, N, replace = TRUE))
	
ggplot(df, mapping = aes(x = x, y = y1)) +
	geom_point(aes(y = y1), col = 'orange') + geom_line(aes(y = y1),col = 'orange') +
	geom_point(aes(y = y2), col = 'blue') + geom_line(aes(y = y2), col = 'blue') +
	geom_point(aes(y = y3), col = 'gray') + geom_line(aes(y = y3), col = 'gray')

If plotting is not the issue then I don't understand what your question is.

Cheers,
Ben


> On Aug 6, 2017, at 3:44 PM, Kirsten Morehouse <kmoreho1 at swarthmore.edu> wrote:
> 
> Hi Ben,
> 
> That's exactly right! Except for each set it's the sample population that is 400, 800 or 300. I want to take 3 samples, each of 100, where only the population differs. I can do this separately, but I'm having trouble putting them all on the same graph. 
> 
> I'd like to have sample on the x axis (1-300) and estimate on the y axis. I want to show how population affects the estimates. 
> 
> Does this make more sense?
> 
> Thanks for your time!
> 
> Kirsten 
> On Sun, Aug 6, 2017 at 3:21 PM Ben Tupper <btupper at bigelow.org <mailto:btupper at bigelow.org>> wrote:
> Hi Kirsten,
> 
> 
> 
> I can run your example code but I can't quite follow your division of sampling.  Can you restate the the task?  Below is what I think you are asking for, but I have the feeling I may be off the mark.
> 
> 
> 
> 
> 
> Set A: 400 samples, draw 100 in range of 5 to 15
> 
> 
> 
> Set B: 800 samples, draw 100 in range of 5 to 15
> 
> 
> 
> Set C: 300 samples, draw 100 in range of 5 to 15
> 
> 
> 
> Ben
> 
> 
> 
> > On Aug 5, 2017, at 9:21 AM, Kirsten Morehouse <kmoreho1 at swarthmore.edu <mailto:kmoreho1 at swarthmore.edu>> wrote:
> 
> >
> 
> > Hi! Thanks for taking the time to read this.
> 
> >
> 
> > The code below creates a graph that takes 100 samples that are between 5%
> 
> > and 15% of the population (400).
> 
> >
> 
> > What I'd like to do, however, is add two other sections to the graph. It
> 
> > would look something like this:
> 
> >
> 
> > from 1-100 samples take 100 samples that are between 5% and 15% of the
> 
> > population (400). From 101-200 take 100 samples that are between 5% and 15%
> 
> > of the population (800). From 201-300 take 100 samples that are between 5%
> 
> > and 15% of the population (300).
> 
> >
> 
> > I assume this would require a nested for loop. Does anyone have advice as
> 
> > to how to do this?
> 
> >
> 
> > Thanks for your time. Kirsten
> 
> >
> 
> > ## Mark-Recapture
> 
> > ## Estimate popoulation from repeated sampling
> 
> >
> 
> > ## Population size
> 
> > N <- 400
> 
> > N
> 
> >
> 
> > ## Vector labeling each item in the population
> 
> > pop <- c(1:N)
> 
> > pop
> 
> >
> 
> > ## Lower and upper bounds of sample size
> 
> > lower.bound <- round(x = .05 * N, digits = 0)
> 
> > lower.bound ## Smallest possible sample size
> 
> >
> 
> > upper.bound <- round(x = .15 * N, digits = 0)
> 
> > upper.bound ## Largest possible sample size
> 
> >
> 
> > ## Length of sample size interval
> 
> > length.ss.interval <- length(c(lower.bound:upper.bound))
> 
> > length.ss.interval ## total possible sample sizes, ranging form lower.bound
> 
> > to upper.bound
> 
> >
> 
> > ## Determine a sample size randomly (not a global variable...simply for
> 
> > test purposes)
> 
> > ## Between lower and upper bounds set previously
> 
> > ## Give equal weight to each possible sample size in this interval
> 
> > sample(x = c(lower.bound:upper.bound),
> 
> >       size = 1,
> 
> >       prob = c(rep(1/length.ss.interval, length.ss.interval)))
> 
> >
> 
> > ## Specify number of samples to take
> 
> > n.samples <- 100
> 
> >
> 
> > ## Initiate empty matrix
> 
> > ## 1st column is population (item 1 thorugh item 400)
> 
> > ## 2nd through nth column are all rounds of sampling
> 
> > dat <- matrix(data = NA,
> 
> >              nrow = length(pop),
> 
> >              ncol = n.samples + 1)
> 
> >
> 
> > dat[,1] <- pop
> 
> >
> 
> > dat
> 
> >
> 
> > ## Take samples of random sizes
> 
> > ## Record results in columns 2 through n
> 
> > ## 1 = sampled (marked)
> 
> > ## 0 = not sampled (not marked)
> 
> > for(i in 2:ncol(dat)) {
> 
> >  a.sample <- sample(x = pop,
> 
> >                     size = sample(x = c(lower.bound:upper.bound),
> 
> >                                   size = 1,
> 
> >                                   prob = c(rep(1/length.ss.interval,
> 
> > length.ss.interval))),
> 
> >                     replace = FALSE)
> 
> >  dat[,i] <- dat[,1] %in% a.sample
> 
> > }
> 
> >
> 
> > ## How large was each sample size?
> 
> > apply(X = dat, MARGIN = 2, FUN = sum)
> 
> > ## 1st element is irrelevant
> 
> > ## 2nd element through nth element: sample size for each of the 100 samples
> 
> >
> 
> > ## At this point, all computations can be done using dat
> 
> >
> 
> > ## Create Schnabel dataframe using dat
> 
> > ## Google the Schnabel formula
> 
> >
> 
> > schnabel.comp <- data.frame(sample = 1:n.samples,
> 
> >                            n.sampled = apply(X = dat, MARGIN = 2, FUN =
> 
> > sum)[2:length(apply(X = dat, MARGIN = 2, FUN = sum))]
> 
> > )
> 
> >
> 
> > ## First column: which sample, 1-100
> 
> > ## Second column: number selected in that sample
> 
> >
> 
> >
> 
> > ## How many items were previously sampled?
> 
> > ## For 1st sample, it's 0
> 
> > ## For 2nd sample, code is different than for remaning samples
> 
> >
> 
> > n.prev.sampled <- c(0, rep(NA, n.samples-1))
> 
> > n.prev.sampled
> 
> >
> 
> > n.prev.sampled[2] <- sum(ifelse(test = dat[,3] == 1 & dat[,2] == 1,
> 
> >                                yes = 1,
> 
> >                                no = 0))
> 
> >
> 
> > n.prev.sampled
> 
> >
> 
> > for(i in 4:ncol(dat)) {
> 
> >  n.prev.sampled[i-1] <- sum(ifelse(test = dat[,i] == 1 &
> 
> > rowSums(dat[,2:(i-1)]) > 0,
> 
> >                                    yes = 1,
> 
> >                                    no = 0))
> 
> > }
> 
> >
> 
> > schnabel.comp$n.prev.sampled <- n.prev.sampled
> 
> >
> 
> > ## n.newly.sampled: in each sample, how many items were newly sampled?
> 
> > ## i.e., never seen before?
> 
> > schnabel.comp$n.newly.sampled <- with(schnabel.comp,
> 
> >                                      n.sampled - n.prev.sampled)
> 
> >
> 
> > ## cum.sampled: how many total items have you seen?
> 
> > schnabel.comp$cum.sampled <- c(0,
> 
> > cumsum(schnabel.comp$n.newly.sampled)[2:n.samples-1])
> 
> >
> 
> > ## numerator of schnabel formula
> 
> > schnabel.comp$numerator <- with(schnabel.comp,
> 
> >                                n.sampled * cum.sampled)
> 
> >
> 
> > ## denominator of schnable formula is n.prev.sampled
> 
> >
> 
> > ## pop.estimate -- after each sample (starting with 2nd -- need at least
> 
> > two samples)
> 
> > schnabel.comp$pop.estimate <- NA
> 
> >
> 
> > for(i in 1:length(schnabel.comp$pop.estimate)) {
> 
> >  schnabel.comp$pop.estimate[i] <- sum(schnabel.comp$numerator[1:i]) /
> 
> > sum(schnabel.comp$n.prev.sampled[1:i])
> 
> > }
> 
> >
> 
> >
> 
> > ## Plot population estimate after each sample
> 
> > if (!require("ggplot2")) {install.packages("ggplot2"); require("ggplot2")}
> 
> > if (!require("scales")) {install.packages("scales"); require("scales")}
> 
> >
> 
> >
> 
> > small.sample.dat <- schnabel.comp
> 
> >
> 
> > small.sample <- ggplot(data = small.sample.dat,
> 
> >                       mapping = aes(x = sample, y = pop.estimate)) +
> 
> >  geom_point(size = 2) +
> 
> >  geom_line() +
> 
> >  geom_hline(yintercept = N, col = "red", lwd = 1) +
> 
> >  coord_cartesian(xlim = c(0:100), ylim = c(300:500)) +
> 
> >  scale_x_continuous(breaks = pretty_breaks(11)) +
> 
> >  scale_y_continuous(breaks = pretty_breaks(11)) +
> 
> >  labs(x = "\nSample", y = "Population estimate\n",
> 
> >       title = "Sample sizes are between 5% and 15%\nof the population") +
> 
> >  theme_bw(base_size = 12) +
> 
> >  theme(aspect.ratio = 1)
> 
> >
> 
> > small.sample
> 
> >
> 
> >       [[alternative HTML version deleted]]
> 
> >
> 
> > ______________________________________________
> 
> > R-help at r-project.org <mailto:R-help at r-project.org> mailing list -- To UNSUBSCRIBE and more, see
> 
> > https://stat.ethz.ch/mailman/listinfo/r-help <https://stat.ethz.ch/mailman/listinfo/r-help>
> 
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html <http://www.r-project.org/posting-guide.html>
> 
> > and provide commented, minimal, self-contained, reproducible code.
> 
> 
> 
> Ben Tupper
> 
> Bigelow Laboratory for Ocean Sciences
> 
> 60 Bigelow Drive, P.O. Box 380
> 
> East Boothbay, Maine 04544
> 
> http://www.bigelow.org <http://www.bigelow.org/>
> 
> 
> 
> Ecocast Reports: http://seascapemodeling.org/ecocast.html <http://seascapemodeling.org/ecocast.html>
> 
> Tick Reports: https://report.bigelow.org/tick/ <https://report.bigelow.org/tick/>
> 
> Jellyfish Reports: https://jellyfish.bigelow.org/jellyfish/ <https://jellyfish.bigelow.org/jellyfish/>
> 
> 
> 
> 
> 
> 
> 

Ben Tupper
Bigelow Laboratory for Ocean Sciences
60 Bigelow Drive, P.O. Box 380
East Boothbay, Maine 04544
http://www.bigelow.org

Ecocast Reports: http://seascapemodeling.org/ecocast.html
Tick Reports: https://report.bigelow.org/tick/
Jellyfish Reports: https://jellyfish.bigelow.org/jellyfish/




	[[alternative HTML version deleted]]



More information about the R-help mailing list