[R] indexing??

Petr Savicky savicky at cs.cas.cz
Wed Feb 29 09:40:43 CET 2012


On Tue, Feb 28, 2012 at 11:42:32AM -0800, helin_susam wrote:
> Dear Petr Savicky,
> 
> Actually, this is based on jackknife after bootstrap algorithm. In summary,
> 
> I have a data set, and I want to compute some values by using this
> algorithm.
> 
> Firstly, using bootstrap, I create some bootstrap re-samples. This step O.K.
> Then, for each data point within these re-samples, I want to get a subset

The point y[j], which you are searching in the generated samples, is
not from "these re-samples", but from the original data set.

> which do not contain that data point ( this point would be any point of the
> original data set), in general, if B is the number of bootstrap-resamples,
> there are B/e resamples obtained for each data point.

Your previous explanations were more accurate in this point and
implied that you want to take all resamples, which miss at least
one of y[j].

>  And finally, I want
> to calculate some values for each of this re samples.

> Explanation of my algorithm;
> 
> #My data set: (x and y)
> y <- c(1,2,3,4,5,6,7,8,9,10)
> x <- c(1,0,0,1,1,0,0,1,1,0)
> 
> n <- length(x)
> 
> t <- matrix(cbind(y,x), ncol=2)
> 
> z = x+y
> 
> for(j in 1:length(x)) {
> out <- vector("list", )
> 
> for(i in 1:10) {
> 
> t.s <- t[sample(n,n,replace=T),] # Here is the bootstrap step
> 
> y.s <- t.s[,1]
> x.s <- t.s[,2]
> 
> z.s <- y.s+x.s
> nn <- sum (z.s)  # For example, I want to calculate this value
> 
> out[[i]] <- list(ff <- (nn), finding=any (y.s==y[j])) # I get the mentioned
> subset in here
> kk <- sapply(out, function(x) {x$finding})
> ff <- out[! kk]
> }
> }

You did not reply to the question concerning regenerating "out"
for each "j" and using "<-" inside a list. This makes a discussion
complicated.

The following code is equivalent to your code.

  y <- c(1,2,3,4,5,6,7,8,9,10)
  x <- c(1,0,0,1,1,0,0,1,1,0)
  n <- length(x)
  tt <- unname(cbind(y,x)) # do not overwrite function t()
  z <- x+y
 
  # needed only to shift the sequence of random numbers
  for (j in 1:(10*(n-1))) sample(n,n,replace=T)
 
  j <- length(x)
  out <- vector("list")
  for(i in 1:10) {
      tt.s <- tt[sample(n,n,replace=T),] # Here is the bootstrap step
 
      y.s <- tt.s[,1]
      x.s <- tt.s[,2]
 
      z.s <- y.s+x.s
      nn <- sum(z.s)  # For example, I want to calculate this value
 
      out[[i]] <- list((nn), finding=any(y.s==y[j])) # I get the mentioned subset in here
  }
  kk <- sapply(out, function(x) {x$finding})
  ff <- out[! kk]

You can check the equivalence by running both codes with the same
command set.seed(seed) at the beginning. I tried this and the
obtained "ff" were identical for several different values of "seed".

What can be seen is that the output depends only on the run of the
loop for j with the value j = length(x). Searching the values y[j]
for j = 1, ..., length(x)-1 does not influence the result.

In other words, the output of your code consists of 10 samples,
which do not contain y[10] (the last element of y). The tests of
the presence of y[1:9] in the samples are performed in your code,
but their results are later overwritten, so they do not influence
the output.

Is this, what you want?

> I obtained the following results of an experiment;
> 
> > kk
>  [1] FALSE  TRUE  TRUE FALSE  TRUE  TRUE FALSE  TRUE  TRUE  TRUE
> > ff
> [[1]]
> [[1]][[1]]
> [1] 47
> 
> [[1]]$finding
> [1] FALSE
> 
> 
> [[2]]
> [[2]][[1]]
> [1] 46
> 
> [[2]]$finding
> [1] FALSE
> 
> 
> [[3]]
> [[3]][[1]]
> [1] 52
> 
> [[3]]$finding
> [1] FALSE
> 
> It is easy to do when "y" contains different elements.  "out[[i]] <- list(ff
> <- (nn), finding=any (y.s==y[j]))"
> 
> But, when y contains the same element, doing this process can be confusing
> confusing..
> Because, (y <- c(1,1,1,0,0,1,0,1,0,0)) for y[j] when j= 1 there are some
> other 1 in the y.  Is there something special about the y to an j ? 

This question is unclear to me.

There are some problems in your code, which i tried to explain repeatedly
in the previous emails. Without clarifying these things, i am not able
to provide any help.

Petr Savicky.



More information about the R-help mailing list