[R] R tools for large files

Andrew C. Ward s195404 at student.uq.edu.au
Mon Aug 25 07:37:28 CEST 2003


Dear Murray,

Perhaps if you gave an example of why/what you actually
wish to do, you may get more useful advice. If the data
easily fits into R, then you could do the subsetting there.
Otherwise, the external database approach is good. It
depends a bit on what resources you have available and how
often you need to do something.


Regards,

Andrew C. Ward

CAPE Centre
Department of Chemical Engineering
The University of Queensland
Brisbane Qld 4072 Australia
andreww at cheque.uq.edu.au


Quoting Murray Jorgensen <maj at stats.waikato.ac.nz>:

> Andrew,
> 
> This is no doubt true, but some things in R work very
> well with big 
> files without the need for any extra software:
> 
> readLines(“c:/data/perry/data.csv”,n=12)
> # prints out the first 12 lines as strings
> 
> flows <-
> read.csv(“c:/data/perry/data.csv”,na.strings=”?”, 
> header=F,nrows=1000)
> # makes a data frame from the first 1000 records
> 
> I would like to get some solution where I don't find
> myself generating 
> large numbers of derived files from the original data
> file.
> 
> Murray
> 
> 
> Andrew C. Ward wrote:
> > Dear Murray,
> > 
> > One way that works very well for many people (including
> me)
> > is to store the data in an external database, such as
> MySQL,
> > and read in just the bits you want using the excellent
> > package RODBC. Getting a database to do all the
> selecting
> > is very fast and efficient, leaving R to concentrate on
> the
> > analysis and visualisation. This is all described in
> the
> > R Import/Export Manual.
> > 
> > 
> > Regards,
> > 
> > Andrew C. Ward
> > 
> > CAPE Centre
> > Department of Chemical Engineering
> > The University of Queensland
> > Brisbane Qld 4072 Australia
> > andreww at cheque.uq.edu.au
> > 
> > 
> > Quoting Murray Jorgensen <maj at stats.waikato.ac.nz>:
> > 
> > 
> >>I'm wondering if anyone has written some functions or
> >>code for handling 
> >>very large files in R. I am working with a data file
> that
> >>is 41 
> >>variables times who knows how many observations making
> up
> >>27MB altogether.
> >>
> >>The sort of thing that I am thinking of having R do is
> >>
> >>- count the number of lines in a file
> >>
> >>- form a data frame by selecting all cases whose line
> >>numbers are in a 
> >>supplied vector (which could be used to extract random
> >>subfiles of 
> >>particular sizes)
> >>
> >>Does anyone know of a package that might be useful for
> >>this?
> >>
> >>Murray
> >>
> >>-- 
> >>Dr Murray Jorgensen     
> >>http://www.stats.waikato.ac.nz/Staff/maj.html
> >>Department of Statistics, University of Waikato,
> >>Hamilton, New Zealand
> >>Email: maj at waikato.ac.nz                              
> 
> >>Fax 7 838 4155
> >>Phone  +64 7 838 4773 wk    +64 7 849 6486 home   
> Mobile
> >>021 1395 862
> >>
> >>______________________________________________
> >>R-help at stat.math.ethz.ch mailing list
> >>https://www.stat.math.ethz.ch/mailman/listinfo/r-help
> >>
> > 
> > 
> > 
> 
> -- 
> Dr Murray Jorgensen     
> http://www.stats.waikato.ac.nz/Staff/maj.html
> Department of Statistics, University of Waikato,
> Hamilton, New Zealand
> Email: maj at waikato.ac.nz                               
> Fax 7 838 4155
> Phone  +64 7 838 4773 wk    +64 7 849 6486 home    Mobile
> 021 1395 862
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
>




More information about the R-help mailing list