[R] plotting huge data

Martin Maechler maechler at stat.math.ethz.ch
Fri Aug 7 16:07:40 CEST 2009


>>>>> "FEH" == Frank E Harrell <f.harrell at vanderbilt.edu>
>>>>>     on Fri, 07 Aug 2009 07:19:16 -0500 writes:

    FEH> gauravbhatti wrote:
    >> I have a data frame with 25000 rows containing two columns Time and Distance.

That's "large" by some standards, but definitely not "huge" ...


    >> When I plot a simple distance versus time plot, the plot is very confusing 
    >> showing no general trend because of the large data. Is there any way I can
    >> improve the plot by lets say using moving average as in EXCEL ? please also
    >> suggest some other methods to make the graph smoother and better looking.
    >> Gaurav

    FEH> I recommend using the quantreg package to fit a quantile regression 
    FEH> model using a spline function of Time.  Draw the estimated curves for 
    FEH> selected quantiles such as 0.1 0.25 0.5 0.75 0.9.  A new function Rq in 
    FEH> the Design package makes this easier but you can do it with just quantreg.

Yes, modelling (with quantreg or also lowess(), runmed() ...) is
certainly a good idea for such a  "Y ~ X" situation.

But to answer the original question:

Please note that R has had for a while the very nice and useful

  smoothScatter()

function,  written exactly for such cases, but also for cases
that are closer to "huge": E.g. still working fast  for
 n <- 1e6

Martin Maechler, ETH Zurich




More information about the R-help mailing list