[R] Analyzing event times with densityplot

David Lindelöf lindelof at ieee.org
Tue Feb 16 20:18:07 CET 2010


Dear useRs,

I have a file with a sequence of event timestamps, for instance the
times at which someone visits a website:

02.02.2010 09:00:00
02.02.2010 09:00:00
02.02.2010 09:00:00
02.02.2010 09:00:01
02.02.2010 09:00:03
02.02.2010 09:00:05
02.02.2010 09:00:06
02.02.2010 09:00:06
02.02.2010 09:00:09
02.02.2010 09:00:11
02.02.2010 09:00:11
02.02.2010 09:00:11
etc, for several thousand rows.

I'd like to get an idea how the web hits are distributed over time,
over the week etc. I extract the data to a dataframe and I tried
plotting densityplots:

library(lattice)
data <- as.POSIXct(scan("data.txt",
                        what=character(0),
                        sep="\n"),
                   format="%d.%m.%Y %T")
data.lt <- as.POSIXlt(data)
data.df <- data.frame(time=data,
                      sec=jitter(data.lt$sec, amount=.5),
                      min=data.lt$min,
                      hour=data.lt$hour,
                      wday=weekdays(data))

densityplot(~(sec+60*min+3600*hour)|wday,
            data.df,
            plot.points=FALSE)


1) Is a densityplot the most appropriate way to analyze this kind of data?

2) The densityplot yields a pdf, but I'd rather see the number of
visits per second on the y-axis. How can I do that?

3) I've found that the shape of the plot depends heavily on the chosen
bandwidth. Ideally I'd like to identify spikes when several visitors
come to the site at the "same" time (say, within 5 seconds of each
others). How should I choose the bandwidth (and kernel for that
matter)?

Your help would be much, much appreciated.


-- 
::::::::::::::::::::::::::::::::::::::::::::
David Lindelöf, Ph.D.
+41 (0)79 415 66 41 or skype:david.lindelof
Better Software for Tomorrow's Cities:
  http://computersandbuildings.com
Follow me on Twitter:
  http://twitter.com/dlindelof



More information about the R-help mailing list