[R] R for large data sets

Andrew Perrin andrew_perrin at unc.edu
Wed Jan 16 19:18:37 CET 2002


I agree with this thread, but would also note that there's a very powerful
open-source database server available that's free (unlike Oracle) and
quite powerful (unlke MySQL [IMHO]). PostgreSQL (www.postgresql.org) is
what I use for a database backend, and it's amazingly impressive. I'm
working, at the moment, with a relational database that has, in total,
over 10 million records of different sorts, and it's quite serviceable.

Just my 2c.

ap

----------------------------------------------------------------------
Andrew J Perrin - andrew_perrin at unc.edu - http://www.unc.edu/~aperrin
 Assistant Professor of Sociology, U of North Carolina, Chapel Hill
      269 Hamilton Hall, CB#3210, Chapel Hill, NC 27599-3210 USA


On Wed, 16 Jan 2002, [ISO-8859-1] José Ernesto Jardim wrote:

> Something like
> 
> select 'column' from mat where 'column' > 0
> 
> The difference is that in sql tables, columns must have names, so 
> instead of using a relative reference like mat[,1] you should use the 
> name of that column. The rest is very intuitive. The SQL language is 
> more "human like" than R/S so it becomes easier to work with subsets.
> 
> I think that everyone that works with the S language will learn the 
> basics of SQL very fast and will gain a lot in working with large datasets.
> 
> Take a look at http://www.sqlcourse.com
> 
> Regards
> 
> EJ
> 
> Agustin Lobo wrote:
> 
> >This is really elegant, Ernesto. The only problem is
> >geting used to the database language also. Do you 
> >have a sort of small dictionary R-MySQL
> >for the (few) subseting procedures that we
> >commonly use in R? For example,
> >how would you say mat[mat[,1]>0,] in
> >MySQL? 
> >
> >Agus
> >
> >On 16 Jan 2002, Ernesto Jardim wrote:
> >
> >>Hi
> >>
> >>I'm using some large datasets and I found the ROracle package to be of
> >>great help.
> >>
> >>If you have the chance to create a database in Oracle or MySQL with one
> >>single table for your dataset, you can then use the ROracle package to
> >>access the dataset. I found several advantages on that. 
> >>
> >>I don't import the data into my environment. I use a small function (see
> >>below) to access the dataset and because the result is a data.frame you
> >>can use it as usually.
> >>
> >>Your environment will not be to large and you'll have the ram memory
> >>less full.
> >>
> >>It's easier to select subsets with SQL than S/R language. 
> >>
> >>Hope it helps
> >>
> >>Regards
> >>
> >>EJ
> >>
> >>--//--
> >>
> >>ora.fun <- function(){
> >>
> >>        library(ROracle)
> >>        m <- dbManager("Oracle")
> >>        con <- dbConnect(m,user="user",password="password")
> >>        dat <- quickSQL(con,"select ...")
> >>        close(con)
> >>        unload(m)
> >>        dat
> >>
> >>}
> >>
> >>--//--
> >>
> >>On Tue, 2002-01-15 at 19:43, Prof Brian Ripley wrote:
> >>
> >>>On Tue, 15 Jan 2002, wei, xiaoyan wrote:
> >>>
> >>>>As a part of our regular data analysis, I have to read in large data sets
> >>>>with six columns and about a million rows. In Splus, this usually take a
> >>>>couple of minutes. I just tried R, it seems take forever to use read.table()
> >>>>to read in the data frame! It did not help much even though I specified
> >>>>colClasses and nrows in read.table().
> >>>>
> >>>>How is R's ability to analyze large data sets? I used R on solaris 2.6 and I
> >>>>used all default compilation flags when building the R package. Will it help
> >>>>if I use some compilation flags with higher optimization level?
> >>>>
> >>>It will help to use R-patched, since I guess you are using 1.4.0.
> >>>Also, look in the list archives, as I answered this more fully earlier
> >>>today.
> >>>
> >>>In either S-PLUS or R, scan would be a better choice for such a dataset.
> >>>
> >>>-- 
> >>>Brian D. Ripley,                  ripley at stats.ox.ac.uk
> >>>Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
> >>>University of Oxford,             Tel:  +44 1865 272861 (self)
> >>>1 South Parks Road,                     +44 1865 272860 (secr)
> >>>Oxford OX1 3TG, UK                Fax:  +44 1865 272595
> >>>
> >>>-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
> >>>r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
> >>>Send "info", "help", or "[un]subscribe"
> >>>(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
> >>>_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
> >>>
> >>
> >>-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
> >>r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
> >>Send "info", "help", or "[un]subscribe"
> >>(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
> >>_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
> >>
> 
> 
> -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
> r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
> Send "info", "help", or "[un]subscribe"
> (in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
> _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
> 

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._



More information about the R-help mailing list