[R] Selecting ranges of dates from a dataframe

Benjamin Stier benjamin.stier at ub.uni-tuebingen.de
Thu Mar 10 14:23:48 CET 2011


Hello list!

I have a data.frame which looks like this:
> serv
datum op.read op.write   read   write
1   2011-01-29 10:00:00       0        0      0       0
2   2011-01-29 10:00:01       0        0      0       0
3   2011-01-29 10:00:02       0        0      0       0
4   2011-01-29 10:00:03       0        4      0  647168
5   2011-01-29 10:00:04       0        0      0       0
6   2011-01-29 10:00:05       0       14      0 1960837
7   2011-01-29 10:00:06       0        0      0       0
...
115 2011-01-30 10:00:54       0        0      0       0
116 2011-01-30 10:00:55       0        0      0       0
117 2011-01-30 10:00:56       0        0      0       0
118 2011-01-30 10:00:57      54        0  29184       0
119 2011-01-30 10:00:58     204        0 122880       0
120 2011-01-30 10:00:59       0        0      0       0
...

I want to compare read/write from each day. I already have a solution, but it
is pretty slow.

# read the data
serv <- read.delim("cut.inp")

# Reformat the dates from the file
serv$datum <- strptime(serv$datum,  "%Y-%m-%d %H:%M:%S")

# select all single days
dates.serv <- unique(strptime(serv$datum, format="%Y-%m-%d"))

# create a data.frame
values <- data.frame(row.names=1, datum=numeric(0), write=numeric(0), read=numeric(0))
for(i in as.character(dates.serv)) {
        # build up a values for a day-range
        searchstart <- as.POSIXlt(paste(i, "00:00:00", sep=" "))
        searchend <- as.POSIXlt(paste(i, "23:59:59", sep=" "))
        # select all values from a specific day
        day <- serv[(serv$datum >= searchstart & serv$datum <= searchend),]
        write <- as.numeric(sum(as.numeric(day$write)))
        read <- as.numeric(sum(as.numeric(day$read)))
        # add to the data.frame
        values <- rbind(values, data.frame(datum=i, write=write, read=read))
}

This is my first try using R for statistics so I'm sure this isn't the best
solution.
The for-loop does it's job, but as I said is really slow. My data is for 21
days and 1 line per second.
Is there a better way to select the date-ranges instead of a for-loop? The
line where I select all values for "day" seems to be the heaviest. Any idea?

Kind regards,

Benjamin

PS: I attached some sample data, in case you want to try for yourself.
-------------- next part --------------
datum	op.read	op.write	read	write
2011-01-29 10:00:00	0	0	0	0
2011-01-29 10:00:01	0	0	0	0
2011-01-29 10:00:02	0	0	0	0
2011-01-29 10:00:03	0	4	0	647168
2011-01-29 10:00:04	0	0	0	0
2011-01-29 10:00:05	0	14	0	1960837
2011-01-29 10:00:06	0	0	0	0
2011-01-29 10:00:07	0	611	0	3533701
2011-01-29 10:00:08	1	0	9728	0
2011-01-29 10:00:09	0	0	0	0
2011-01-29 10:00:10	3	0	13824	0
2011-01-29 10:00:11	1	0	1023	0
2011-01-29 10:00:12	2	1	13824	90112
2011-01-29 10:00:13	0	0	0	0
2011-01-29 10:00:14	0	0	0	0
2011-01-29 10:00:15	0	0	0	0
2011-01-29 10:00:16	0	0	0	0
2011-01-29 10:00:17	0	0	0	0
2011-01-29 10:00:18	0	0	0	0
2011-01-29 10:00:19	0	0	0	0
2011-01-29 10:00:20	0	0	0	0
2011-01-29 10:00:21	0	0	0	0
2011-01-29 10:00:22	0	0	0	0
2011-01-29 10:00:23	0	0	0	0
2011-01-29 10:00:24	0	0	0	0
2011-01-29 10:00:25	0	0	0	0
2011-01-29 10:00:26	0	0	0	0
2011-01-29 10:00:27	0	0	0	0
2011-01-29 10:00:28	0	0	0	0
2011-01-29 10:00:29	0	0	0	0
2011-01-29 10:00:30	0	0	0	0
2011-01-29 10:00:31	0	0	0	0
2011-01-29 10:00:32	0	0	0	0
2011-01-29 10:00:33	0	0	0	0
2011-01-29 10:00:34	0	0	0	0
2011-01-29 10:00:35	0	0	0	0
2011-01-29 10:00:36	0	0	0	0
2011-01-29 10:00:37	0	651	0	3397386
2011-01-29 10:00:38	0	0	0	0
2011-01-29 10:00:39	0	0	0	0
2011-01-29 10:00:40	0	0	0	0
2011-01-29 10:00:41	0	0	0	0
2011-01-29 10:00:42	0	0	0	0
2011-01-29 10:00:43	0	0	0	0
2011-01-29 10:00:44	0	0	0	0
2011-01-29 10:00:45	0	0	0	0
2011-01-29 10:00:46	0	0	0	0
2011-01-29 10:00:47	0	0	0	0
2011-01-29 10:00:48	0	0	0	0
2011-01-29 10:00:49	0	0	0	0
2011-01-29 10:00:50	0	0	0	0
2011-01-29 10:00:51	0	0	0	0
2011-01-29 10:00:52	0	0	0	0
2011-01-29 10:00:53	8	0	20480	0
2011-01-29 10:00:54	42	0	63488	0
2011-01-29 10:00:55	58	4	721920	655360
2011-01-29 10:00:56	16	3	29696	524288
2011-01-29 10:00:57	0	0	0	131072
2011-01-29 10:00:58	17	0	27648	0
2011-01-29 10:00:59	26	5	119808	786432
2011-01-30 10:00:00	0	0	0	0
2011-01-30 10:00:01	0	0	2560	0
2011-01-30 10:00:02	0	0	0	0
2011-01-30 10:00:03	0	0	0	0
2011-01-30 10:00:04	0	0	0	0
2011-01-30 10:00:05	0	0	0	0
2011-01-30 10:00:06	0	0	0	0
2011-01-30 10:00:07	0	0	0	0
2011-01-30 10:00:08	0	0	0	0
2011-01-30 10:00:09	0	0	0	0
2011-01-30 10:00:10	0	0	0	0
2011-01-30 10:00:11	0	0	0	0
2011-01-30 10:00:12	0	0	0	0
2011-01-30 10:00:13	0	433	0	1279262
2011-01-30 10:00:14	0	5	0	49152
2011-01-30 10:00:15	0	0	0	0
2011-01-30 10:00:16	0	0	0	0
2011-01-30 10:00:17	0	0	0	0
2011-01-30 10:00:18	0	0	0	0
2011-01-30 10:00:19	0	0	0	0
2011-01-30 10:00:20	0	0	0	0
2011-01-30 10:00:21	0	0	0	0
2011-01-30 10:00:22	0	0	0	0
2011-01-30 10:00:23	0	0	0	0
2011-01-30 10:00:24	0	0	0	0
2011-01-30 10:00:25	0	4	1023	327680
2011-01-30 10:00:26	10	0	36352	0
2011-01-30 10:00:27	1	0	6144	0
2011-01-30 10:00:28	21	0	52736	0
2011-01-30 10:00:29	0	0	0	0
2011-01-30 10:00:30	0	0	0	0
2011-01-30 10:00:31	0	0	0	0
2011-01-30 10:00:32	25	0	86016	0
2011-01-30 10:00:33	0	0	0	0
2011-01-30 10:00:34	0	0	0	0
2011-01-30 10:00:35	0	0	0	0
2011-01-30 10:00:36	0	0	0	0
2011-01-30 10:00:37	0	0	0	0
2011-01-30 10:00:38	0	0	0	0
2011-01-30 10:00:39	0	0	0	0
2011-01-30 10:00:40	3	0	7168	0
2011-01-30 10:00:41	0	0	0	0
2011-01-30 10:00:42	0	0	0	0
2011-01-30 10:00:43	95	204	359424	992256
2011-01-30 10:00:44	121	364	381952	1572864
2011-01-30 10:00:45	0	0	0	0
2011-01-30 10:00:46	0	0	1023	0
2011-01-30 10:00:47	0	0	0	0
2011-01-30 10:00:48	0	0	0	0
2011-01-30 10:00:49	0	0	0	0
2011-01-30 10:00:50	0	0	0	0
2011-01-30 10:00:51	0	0	0	0
2011-01-30 10:00:52	0	3	3072	413696
2011-01-30 10:00:53	0	0	0	0
2011-01-30 10:00:54	0	0	0	0
2011-01-30 10:00:55	0	0	0	0
2011-01-30 10:00:56	0	0	0	0
2011-01-30 10:00:57	54	0	29184	0
2011-01-30 10:00:58	204	0	122880	0
2011-01-30 10:00:59	0	0	0	0


More information about the R-help mailing list