[R] Preparing data for display

Stavros Macrakis macrakis at alum.mit.edu
Mon Nov 10 22:28:38 CET 2008


I have a dataset of about 10^6 rows, each consisting of a timestamp,
several factors, a string, some integers, and some floats.

I'd like to graph this data in various ways, including straightforward
ones (how many events per week over the past year for each of 4 values
of some factor), some less straightforward.  I've managed to do this
by brute force, but I'd like to learn how to do it in more elegant,
more R-like code.

Consider for example the following, which graphs the 25th, 50th, and
75th percentile values per day of data$x

perc <- function(code,data)
{ # select the part of the data with factor value
  slice <- data[data$factor == code,];
  # calc quartiles for each day
  quarts <- tapply(slice$x,
                             slice$day,
                             function(x) quantile(x,c(.25,.50,.75)));
                     # returns a tagged list of tagged vectors
                     # list("2008-10-07" = c("25%" = .05, "50%" = .47,
... ) , ...)
    # convert to a data frame -- is there some mapping function to do this?
   fr <- data.frame( day = to.time(names(quarts)),       # strings
back to dates (!)
                               "25%" = sapply(quarts, function(x)
x[[1]] ),   # !!
                               "50%" = sapply(quarts, function(x) x[[2]] ),
                               "75%" = sapply(quarts, function(x) x[[3]] ) );
                     # columns are now labelled "X25." etc. (!)
    for (i in 2:4) { plot( fr$day, res[[2]], type="l", ylim= c( 0,
max(pmax(fr[[1]],fr[[2]],fr[[3]] )) ));
                          par(new=TRUE); }
    par(new=FALSE);
    }

This works, but is pretty ugly in a variety of ways.  What is the
right way to do this?

Thanks,

           -s



More information about the R-help mailing list