[R] Plot of large dataset

Duncan Murdoch murdoch at stats.uwo.ca
Mon Sep 8 19:01:09 CEST 2008


I'd start with scatterplots of the two subsets (pass vs fail), but with 
280k points, those are likely to be fairly uninformative masses of black 
ink).  However, there might be enough separation between them that you 
don't need anything else.

If not, then a pair of hexbin plots (from the Bioconductor hexbin 
package), e.g.

plot(hexbin(rnorm(280000), rnorm(280000)))

may work.  Other possibilities are to use partially transparent points, 
and possibly to use jittering if there are a lot of ties.

I would avoid 3D histograms; they aren't nearly as informative.

Duncan Murdoch




On 9/8/2008 11:40 AM, Jason Thibodeau wrote:
> I apologize, I forgot to type the title.
> 
> On Mon, Sep 8, 2008 at 11:39 AM, Jason Thibodeau <jbloudg20 at gmail.com>wrote:
> 
>> Hello all,
>>
>> I have a very large file (280k lines) containing three comma separated
>> variables. The first variable is a 0 or 1 depicting a pass or fail. The
>> other two are X and Y coordinates. Is there a good way I can represent this
>> data in a chart/plot form other than using a 3d histogram? If I need to use
>> the histogram, should I base my chart off the example contained in the RGL
>> package?
>>
>> Thanks a lot.
>>
>> --
>> Jason Thibodeau
>> ECE Dept., University of Connecticut
>> 371 Fairfield Way, Storrs, CT 06269
>> Phone: 860-486-5274 , Fax: 860-486-2447
>> Email: jpt03002 at engr.uconn.edu
>> URL: www.engr.uconn.edu/~jpt03002 <http://www.engr.uconn.edu/%7Ejpt03002>
>>
> 
> 
>



More information about the R-help mailing list