[R] sample (randomly select) to get a number of successive days

Marc Schwartz m@rc_@chw@rtz @ending from me@com
Mon Dec 10 14:53:44 CET 2018


Hi,

Given that your original data frame example is:

myframe <- data.frame (Timestamp=c("24.09.2012 09:00:00", "24.09.2012 10:00:00","25.09.2012 09:00:00",
                                   "25.09.2012 09:00:00","24.09.2012 09:00:00", "24.09.2012 10:00:00"),
                       Event=c(50,60,30,40,42,54))

> str(myframe)
'data.frame':	6 obs. of  2 variables:
 $ Timestamp: Factor w/ 3 levels "24.09.2012 09:00:00",..: 1 2 3 3 1 2
 $ Event    : num  50 60 30 40 42 54


Your Timestamp variable is a factor, not a datetime variable. So you first need to coerce it to one, in order to be able to define a range of dates.

Thus:

## See ?as.POSIXlt and the See Also links therein for more information on how R handles dates/times

myframe$Timestamp <- as.POSIXct(myframe$Timestamp, format = "%d.%m.%Y %H:%M:%S")

> str(myframe)
'data.frame':	6 obs. of  2 variables:
 $ Timestamp: POSIXct, format: "2012-09-24 09:00:00" ...
 $ Event    : num  50 60 30 40 42 54


So, to keep it simple, since you appear to be only concerned during the range selection process with the day and not the time, let's use the day part of the datetime as the basis for defining your interval. So, for clarity, let's create a new column in the data frame that is just the date:

myframe$day <- as.Date(myframe$Timestamp)

> str(myframe)
'data.frame':	6 obs. of  3 variables:
 $ Timestamp: POSIXct, format: "2012-09-24 09:00:00" ...
 $ Event    : num  50 60 30 40 42 54
 $ day      : Date, format: "2012-09-24" ...


> myframe
            Timestamp Event        day
1 2012-09-24 09:00:00    50 2012-09-24
2 2012-09-24 10:00:00    60 2012-09-24
3 2012-09-25 09:00:00    30 2012-09-25
4 2012-09-25 09:00:00    40 2012-09-25
5 2012-09-24 09:00:00    42 2012-09-24
6 2012-09-24 10:00:00    54 2012-09-24


With that in place, let's presume that you selected 2012-09-24 as your starting date. You can then use ?seq.Date to define the range:

set.seed(1)
start <- sample(myframe$day, 1)

> start
[1] "2012-09-24"

> str(start)
 Date[1:1], format: "2012-09-24"


So, create the range of 25 dates:

> seq(start, length.out = 25, by = "day")
 [1] "2012-09-24" "2012-09-25" "2012-09-26" "2012-09-27" "2012-09-28"
 [6] "2012-09-29" "2012-09-30" "2012-10-01" "2012-10-02" "2012-10-03"
[11] "2012-10-04" "2012-10-05" "2012-10-06" "2012-10-07" "2012-10-08"
[16] "2012-10-09" "2012-10-10" "2012-10-11" "2012-10-12" "2012-10-13"
[21] "2012-10-14" "2012-10-15" "2012-10-16" "2012-10-17" "2012-10-18"


Now, use the result of the above to subset your data frame. See ?subset and ?"%in%":

myframe.rand <- subset(myframe, day %in% seq(start, length.out = 25, by = "day"))


In your example, all rows will be returned, but from your larger dataset, you will only get the rows that have dates within the range defined.

Given the above, I will leave it to you to define the truncated date range from your full dataset, so that your initial starting date is sufficiently before your 'max' date, so that you can select 25 consecutive days.

Regards,

Marc Schwartz


> On Dec 10, 2018, at 2:37 AM, Dagmar Cimiotti <dagmar.cimiotti using ftz-west.uni-kiel.de> wrote:
> 
> Hi Marc,
> 
> Yes, you got it to the point! That is exactly what I want. But I do not know how to do that. I know how to randomly pick the first day but I do not know how to set a range of values which cover the 25 days starting from that random value.
> 
> Best,
> Dagmar
> 
> 
> Hi,
> 
> I am confused.
> 
> As far as I can tell, only the first day is selected randomly from your dataset. The subsequent 24 days are deterministic, since they need to be consecutive days from the first day, for a total of 25 consecutive days.
> 
> Thus, all you need to do is to randomly select 1 day from within the time range of your dataset to be the first day, that is also far enough from the maximum date, to allow you to then select the data from the additional 24 consecutive days.
> 
> So randomly pick your first day and set a range of values, covering the 25 days, to use to then subset your full dataset.
> 
> What am I missing?
> 
> Regards,
> 
> Marc Schwartz
> 
> 
>> On Dec 7, 2018, at 7:18 PM, Dagmar Cimiotti<dagmar.cimiotti using ftz-west.uni-kiel.de>  wrote:
>> 
>> Hi Jim and everyone else,
>> 
>> Mhm, no this is not what I am looking for. I think in your way I would
>> randomly sample two values of day 1 and of day 2. But I want the
>> opposite: I want to randomly draw two successive (!) days and put those
>> values in a new dataframe to continue working with them.
>> 
>> In my real data I do have a huge time span and I want to draw 25
>> consecutive days. So maybe my example was a little misleading. And now
>> that I read it again my text was, too. Sorry about that!
>> 
>> Good try though and I am very gratefull for your good will to help me
>> Would anyone give another try?
>> 
>> Dagmar
>> 
>> Am 07.12.2018 um 10:30 schrieb Jim Lemon:
>>> Hi Dagmar,
>>> This will probably involve creating a variable to differentiate the
>>> two days in each data.frame:
>>> 
>>> myframe$day<-as.Date(as.character(myframe$Timestamp),"%d.%m.%Y %H:%M:%S")
>>> days<-unique(myframe$day)
>>> 
>>> Then just sample the two subsets and concatenate them:
>>> 
>>> myframe[c(sample(which(myframe$day==days[1]),2),
>>>   sample(which(myframe$day==days[2]),2)),]
>>> 
>>> Jim
>>> 
>>> 
>>> On Fri, Dec 7, 2018 at 8:08 PM Dagmar Cimiotti
>>> <dagmar.cimiotti using ftz-west.uni-kiel.de>  wrote:
>>>> Dear all,
>>>> 
>>>> I have data from a time span like this:
>>>> 
>>>> myframe <- data.frame (Timestamp=c("24.09.2012 09:00:00", "24.09.2012
>>>> 10:00:00","25.09.2012 09:00:00",
>>>>                                      "25.09.2012
>>>> 09:00:00","24.09.2012 09:00:00", "24.09.2012 10:00:00"),
>>>>                           Event=c(50,60,30,40,42,54) )
>>>> myframe
>>>> 
>>>> 
>>>> I want to create a new dataframe which includes in this example the
>>>> data from two successive days (in my real data I have a big time span
>>>> and want data from 25 consecutive days). I understand that I can do a
>>>> simple sample like this
>>>> 
>>>> mysample <- myframe[sample(1:nrow(myframe), 4,replace=FALSE),]
>>>> mysample
>>>> 
>>>> But I need the data from consecutive days in my random sample. Can
>>>> anyone help me with this?
>>>> 
>>>> 
>>>> Many thanks in advance,
>>>> Dagmar



More information about the R-help mailing list