[R] Randomly select elements based on criteria
Peter Ehlers
ehlers at ucalgary.ca
Fri Mar 23 00:58:08 CET 2012
Here's another way:
With d1 as your data frame,
library(plyr)
d2 <- ddply(d1, .(fam), function(x) x[sample(nrow(x), 1), ])
d2[sample(nrow(d2), 2), ]
If you have to take account of the 'only one family' case, you
can wrap this in a function with an appropriate check:
fish <- function(d){
if(length(unique(d[,'fam'])) < 2) stop('only one family')
d2 <- ddply(d,.(fam),function(x)x[sample(nrow(x), 1), ])
d2[sample(nrow(d2), 2), ]
}
Peter Ehlers
On 2012-03-22 16:03, Jorge I Velez wrote:
> You could avoid the loop to run for ever by introducing a stop() check.
> Here is an example using Dr. Savicky's code:
> # function to sample B pairs of
> # fishes from different families
> # -- d has columns fam, born, spawn
> foo<- function(d, B){
>
> # internal function
> foo<- function(d){
> if(length(unique(d[, 'fam']))< 2) stop('only one family!')
> while (1) {
> ran<- sample(NROW(d), size = 2)
> if (d[ran[1], 1] != d[ran[2], 1]) break
> }
> d[ran, ]
> }
> # sampling B pairs of fishes
> lapply(1:B, function(i) foo(d))
> }
>
> # example: 2 pairs of fishes from different families
> foo(fish, 2)
>
> # data with only one family
> ff<- fish[1,]
> foo(ff, 2) # Error in foo(d) : only one family!
>
> HTH,
> Jorge.-
> On Thu, Mar 22, 2012 at 5:27 PM, Petr Savicky<> wrote:
>
>> On Thu, Mar 22, 2012 at 11:42:53AM -0700, aly wrote:
>>> Hi,
>>>
>>> I want to randomly pick 2 fish born the same day but I need those
>>> individuals to be from different families. My table includes 1787 fish
>>> distributed in 948 families. An example of a subset of fish born in one
>>> specific day would look like:
>>>
>>>> fish
>>>
>>> fam born spawn
>>> 25 46 43
>>> 25 46 56
>>> 26 46 50
>>> 43 46 43
>>> 131 46 43
>>> 133 46 64
>>> 136 46 43
>>> 136 46 42
>>> 136 46 50
>>> 136 46 85
>>> 137 46 64
>>> 142 46 85
>>> 144 46 56
>>> 144 46 64
>>> 144 46 78
>>> 144 46 85
>>> 145 46 64
>>> 146 46 64
>>> 147 46 64
>>> 148 46 78
>>> 149 46 43
>>> 149 46 98
>>> 149 46 85
>>> 150 46 64
>>> 150 46 78
>>> 150 46 85
>>> 151 46 43
>>> 152 46 78
>>> 153 46 43
>>> 156 46 43
>>> 157 46 91
>>> 158 46 42
>>> Where "fam" is the family that fish belongs to, "born" is the day it was
>>> born (in this case day 46), and "spawn" is the day it was spawned. I
>> want to
>>> know if there is a correlation in the day of spawn between fish born the
>>> same day but that are unrelated (not from the same family).
>>> I want to randomly select two rows but they have to be from different
>> fam.
>>> The fist part (random selection), I got it by doing:
>>>
>>>> ran<- sample(nrow (fish), size=2); ran
>>>
>>> [1] 9 12
>>>
>>>> newfish<- fish [ran,]; newfish
>>>
>>> fam born spawn
>>> 103 136 46 50
>>> 106 142 46 85
>>>
>>> In this example I got two individuals from different families (good) but
>> I
>>> will repeat the process many times and there's a chance that I get two
>> fish
>>> from the same family (bad):
>>>
>>>> ran<-sample (nrow(fish), size=2);ran
>>>
>>> [1] 26 25
>>>
>>>> newfish<-fish [ran,]; newfish
>>>
>>> fam born spawn
>>> 127 150 46 85
>>> 126 150 46 78
>>>
>>> I need a conditional but I have no clue on how to include it in the code.
>>
>> Hi.
>>
>> Try the following.
>>
>> while (1) {
>> ran<- sample(nrow(fish), size=2)
>> if (fish[ran[1], 1] != fish[ran[2], 1]) break
>> }
>> fish[ran, ]
>>
>> This will generate only pairs from different families. However,
>> note that the loop will run forever, if the data contain only
>> fish from one family.
>>
>> Hope this helps.
>>
>> Petr Savicky.
>>
> [[alternative HTML version deleted]]
