[R] Randomly select elements based on criteria

Petr Savicky savicky at cs.cas.cz
Fri Mar 23 11:47:18 CET 2012


On Fri, Mar 23, 2012 at 10:56:11AM +0100, Petr Savicky wrote:
> On Thu, Mar 22, 2012 at 11:42:53AM -0700, aly wrote:
[...]
> > I want to randomly select two rows but they have to be from different fam.
> > The fist part (random selection), I got it by doing:
> > 
> > > ran <- sample(nrow (fish), size=2); ran
> > 
> > [1]  9 12
> > 
> > > newfish <- fish [ran,];  newfish
> > 
> >     fam born spawn
> > 103 136   46    50 
> > 106 142   46    85 
> > 
> > In this example I got two individuals from different families (good) but I
> > will repeat the process many times and there's a chance that I get two fish
> > from the same family (bad):
> > 
> > > ran<-sample (nrow(fish), size=2);ran
> > 
> > [1] 26 25
> > 
> > > newfish <-fish [ran,]; newfish
> > 
> >     fam born spawn
> > 127 150   46    85
> > 126 150   46    78
> > 
> > I need a conditional but I have no clue on how to include it in the code.
> 
> Hi.
> 
> Try the following.
> 
>   ran1 <- sample(nrow(fish), 1)
>   ind <- which(fish$fam !=  fish$fam[ran1])
>   ran2 <- ind[sample(length(ind), 1)]
>   fish[c(ran1, ran2), ]
> 
> This generates the pairs from exactly the same distribution as
> the rejection method suggested earlier, however, it does not
> contain a loop.

Hi.

I am sorry for a wrong statement. If there are more than two
families, then the distributions from the two methods are only
approximately equal, not exactly.

If the sizes of families are, say

  n <- c(20, 3, 3)
  p <- n/sum(n)

then the probability to a get a pair from families (i, j)
using the rejection method is p1[i, j], where p1 is

  p1 <- p %o% p
  diag(p1) <- 0
  p1 <- p1/sum(p1)
  p1 <- p1 + t(p1)
  p1[row(p1) >= col(p1)] <- 0
  p1

       [,1]      [,2]       [,3]
  [1,]    0 0.4651163 0.46511628
  [2,]    0 0.0000000 0.06976744
  [3,]    0 0.0000000 0.00000000

The above produces a pair from families (i, j) with probability
p2[i, j], where p2 is 

  p2 <- p %o% p
  diag(p2) <- 0
  p2 <- p2/rep(rowSums(p2), times=nrow(p2))*p
  p2 <- p2 + t(p2)
  p2[row(p2) >= col(p2)] <- 0
  p2

       [,1]      [,2]       [,3]
  [1,]    0 0.4849498 0.48494983
  [2,]    0 0.0000000 0.03010033
  [3,]    0 0.0000000 0.00000000

Petr Savicky.



More information about the R-help mailing list