[R] sequential processing
Prof Brian Ripley
ripley at stats.ox.ac.uk
Tue Jan 23 09:12:12 CET 2007
On Mon, 22 Jan 2007, Gerard Smits wrote:
> So, I take it, given that the use of a pipe is suggested for
> sequential reading, that the standard approach to processing a data
> frame is to load the entire file? Please correct if wrong.
Yes, because most data frames are tiny compared to current RAM sizes.
But the R has connections and lots of means to read from them indicates
that other approaches are also supported. Large datasets are often
kept in DBMSs, and data transferred to R as required.
There is an 'R Data Import/Export' manual, and this would have illuminated
the subject for you.
> BTW, I am not interested in finding direct translations of SAS data
> step statements to R, but instead in finding an approach by which I
> can address the type of problems I consistent have to deal with
> (grouped processing with retention of baseline records, etc.). I'll
> read more on the indexing as a means of dealing with relative position issues
>> You could also load the entire file into a DBMS then pull parts of it
>> into R, or read specific lines through a pipe e.g.
>> readLines(pipe("sed, grep, python... command")).
>> Don't try to replicate the SAS processing into R. The exact
>> translations of the SAS DATA STEP usage of _N_, first., last., retain
>> etc into R would be: inefficient, ugly, retrogressive, wrong, rigid,
>> complicated, silly and so on. For a start, read up on indexing - this
>> seemingly simple and innocuous R feature is in fact far more powerful
>> than the entire DATA STEP with its whole bag of tricks. Then search
>> the list for similar questions, for example
>>> -----Original Message-----
>>> From: r-help-bounces at stat.math.ethz.ch
>>> [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Gerard Smits
>>> Sent: Sunday, January 21, 2007 2:22 PM
>>> To: r-help at stat.math.ethz.ch
>>> Subject: [R] sequential processing
>>> Like many others, I am new to R but old to SAS.
>>> Am I correct in understanding that R processes a data frame in a
>>> sequential ly? This would imply that large input files could be
>>> read, without the need to load the entire file into memory.
>>> Related to the manner of reading a frame, I have been looking for the
>>> equivalent of SAS _n_ (I realize that I can use a variant of which to
>>> identify an index value) as well as useful SAS features such as
>>> first., last., retain, etc. Any help with this conversion
>>> Gerard Smits
>>> R-help at stat.math.ethz.ch mailing list
>>> PLEASE do read the posting guide
>>> and provide commented, minimal, self-contained, reproducible code.
> [[alternative HTML version deleted]]
> R-help at stat.math.ethz.ch mailing list
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-help