[R] importing and filtering time series data

jim holtman jholtman at gmail.com
Mon May 2 01:39:20 CEST 2011


Here is one approach.  It would be good to provide a reasonable sample of data:

> x <- unclass(Sys.time())  # today's date
> # create some data
> # increments by ~ 0.1 seconds
> len <- cumsum(runif(100, 0, 0.1))
> dataFile <- data.frame(time = x + len,
+                        flag = sample(c("Y", "N"), 100, TRUE),
+                        dur = runif(100, 10,1000)
+                       )
> write.csv(dataFile, file = 'myData.csv', row.names = FALSE)
>
> # read the data and summarize by 1 second intervals
> input <- read.csv('myData.csv')
> # remove "N"
> input <- subset(input, flag == "N")
> require(data.table)  # I like this for creating summaries
> input <- data.table(input)
> # add column for summary
> input$key <- factor(trunc(input$time))
> input[,
+     list(count = length(time)
+        , latency = mean(dur)
+        , var = var(dur)
+        , '5%' = quantile(dur, prob = 0.05)
+        , '95%' = quantile(dur, prob = 0.95)
+        )
+     , by = key
+     ]
            key count  latency       var       X5.     X95.
[1,] 1304293090     6 558.3471  73765.28 255.09390 872.3692
[2,] 1304293091     8 580.4440 103743.05 132.39461 963.2297
[3,] 1304293092    10 494.1759  62945.55 150.89719 869.8083
[4,] 1304293093    10 557.1942 105834.81 102.53878 941.1442
[5,] 1304293094    17 477.2077 106452.72  35.15032 947.0750
>
>


On Fri, Apr 29, 2011 at 11:27 AM, Joel Reymont <joelr1 at gmail.com> wrote:
> Folks,
>
> I'm new to R and would like to use it to analyze web server performance data.
>
> I collect the data in this CSV format:
>
> 1304083104.41,Y,668.856249809
> 1304083104.41,Y,348.143193007
>
> First column is a <seconds.microseconds> timestamp, rows with N instead of Y need to be skipped and the last column has the same format as the first column, except it's request duration (latency).
>
> I would like to calculate average number of requests per second, mean latency, variance, 5 and 95 percentiles.
>
> What is the best way to accomplish this, starting with importing of time series?
>
>        Thanks, Joel
>
> --------------------------------------------------------------------------
> - for hire: mac osx device driver ninja, kernel extensions and usb drivers
> ---------------------+------------+---------------------------------------
> http://wagerlabs.com | @wagerlabs | http://www.linkedin.com/in/joelreymont
> ---------------------+------------+---------------------------------------
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?



More information about the R-help mailing list