[R] Conditional Random selection

Ashta sewashm at gmail.com
Sat Nov 21 20:52:53 CET 2015


Hi  Bert  and all,
I have related question.  In each  time period there were different
locations where the samples were collected (S1).   I  want count  the
number of unique locations (S1)  for each unique time period . So in
time 1 the samples were collected from two locations and time 2 only
from one location and time 3  from  three locations..

tab  <- read.table(textConnection(" time   S1  rep
1      1       1
1      2       1
1      2       2
2      1       1
2      1       2
2      1       3
2      1       4
3      1       1
3      2       1
3      3       1   "),header = TRUE)

what I want is

time  S1
    1    2
    2    1
    3    3

Thank you again.



On Sat, Nov 21, 2015 at 1:30 PM, Ashta <sewashm at gmail.com> wrote:
>  Thank you Bert!
>
> What I want is at least 500 samples based on random  sampling of time
> period. This allows samples  collected at the same time period are
> included together.
>
> Your script is doing what I wanted to do!!
>
> Many thanks
>
>
>
>
> On Sat, Nov 21, 2015 at 1:15 PM, Bert Gunter <bgunter.4567 at gmail.com> wrote:
>> David's "solution" is incorrect. It can also fail to give you times
>> with a total of 500 items to sample from in the time periods.
>>
>> It is not entirely clear what you want. The solution below gives you a
>> random sample of time periods in which X1>0 and the total number of
>> samples among them is >= 500. It does not give you the fewest number
>> of periods that can do this. Is this what you want?
>>
>> tab[with(tab,{
>>   rownums<- sample(seq_len(nrow(tab))[X1>0])
>>   sz <- cumsum(X2[rownums])
>>   rownums[c(TRUE,sz<500)]
>> }),]
>>
>> Cheers,
>> Bert
>>
>>
>> Bert Gunter
>>
>> "Data is not information. Information is not knowledge. And knowledge
>> is certainly not wisdom."
>>    -- Clifford Stoll
>>
>>
>> On Sat, Nov 21, 2015 at 10:56 AM, Ashta <sewashm at gmail.com> wrote:
>>> Thank you  David!
>>>
>>> I rerun the your script and it is giving me the first three time periods
>>> is it doing random sampling?
>>>
>>>       tab.fan
>>>   time X1  X2
>>> 2    2  5 230
>>> 3    3  1 300
>>> 5    5  2  10
>>>
>>>
>>>
>>> On Sat, Nov 21, 2015 at 12:20 PM, David L Carlson <dcarlson at tamu.edu> wrote:
>>>> Use dput() to send data to the list as it is more compact:
>>>>
>>>>> dput(tab)
>>>> structure(list(time = 1:8, X1 = c(0L, 5L, 1L, 0L, 2L, 3L, 1L,
>>>> 4L), X2 = c(251L, 230L, 300L, 25L, 10L, 101L, 300L, 185L)), .Names = c("time",
>>>> "X1", "X2"), class = "data.frame", row.names = c(NA, -8L))
>>>>
>>>> You can just remove the lines with X1 = 0 since you don't want to use them.
>>>>
>>>>> tab.sub <- tab[tab$X1>0, ]
>>>>
>>>> Then the following gives you a sample:
>>>>
>>>>> tab.sub[cumsum(sample(tab.sub$X2))<=500, ]
>>>>
>>>> Note, that your "solution" of times 6, 7, and 8 will never appear because the sum of the values is 586.
>>>>
>>>>
>>>> David L. Carlson
>>>> Department of Anthropology
>>>> Texas A&M University
>>>>
>>>> -----Original Message-----
>>>> From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Ashta
>>>> Sent: Saturday, November 21, 2015 11:53 AM
>>>> To: R help <r-help at r-project.org>
>>>> Subject: [R] Conditional Random selection
>>>>
>>>> Hi all,
>>>>
>>>> I have a data set that contains samples collected over time.   In
>>>> each time period the total number of samples are given (X2)   The goal
>>>> is to  select 500  random samples.    The selection should be based on
>>>> time  (select time periods until I reach 500 samples). Also the time
>>>> period should have greater than 0 for  X1 variable. X1 is an indicator
>>>> variable.
>>>>
>>>> Select "time" until reaching the  sum of X2  is > 500 and if   X1 is  >  0
>>>>
>>>> tab  <- read.table(textConnection(" time   X1 X2
>>>> 1      0        251
>>>> 2      5        230
>>>> 3      1        300
>>>> 4      0         25
>>>> 5      2         10
>>>> 6      3         101
>>>> 7      1         300
>>>>  8     4         185   "),header = TRUE)
>>>>
>>>> In the above example,  samples from time 1 and 4  will not be selected
>>>> ( X1 is zero)
>>>> So I could reach my target by selecting time 6,7, and 8 or  time 2 and
>>>> 3 and so on.
>>>>
>>>> Can any one help to do that?
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list