[R] sequential processing

Prof Brian Ripley ripley at stats.ox.ac.uk
Tue Jan 23 09:12:12 CET 2007


On Mon, 22 Jan 2007, Gerard Smits wrote:

> So, I take it, given that the use of a pipe is suggested for
> sequential reading, that the standard approach to processing a data
> frame is to load the entire file?  Please correct if wrong.

Yes, because most data frames are tiny compared to current RAM sizes.
But the R has connections and lots of means to read from them indicates 
that other approaches are also supported.  Large datasets are often
kept in DBMSs, and data transferred to R as required.

There is an 'R Data Import/Export' manual, and this would have illuminated 
the subject for you.


> BTW, I am not interested in finding direct translations of SAS data
> step statements to R, but instead in finding an approach by which I
> can address the type of problems I consistent have to deal with
> (grouped processing with retention of baseline records, etc.).  I'll
> read more on the indexing as a means of dealing with relative position issues
>
> Thanks,
>
> Gerard
>
>
>
>> You could also load the entire file into a DBMS then pull parts of it
>> into R, or read specific lines through a pipe e.g.
>> readLines(pipe("sed, grep, python... command")).
>>
>> Don't try to replicate the SAS processing into R. The exact
>> translations of the SAS DATA STEP usage of _N_, first., last., retain
>> etc into R would be: inefficient, ugly, retrogressive, wrong, rigid,
>> complicated, silly and so on. For a start, read up on indexing - this
>> seemingly simple and innocuous R feature is in fact far more powerful
>> than the entire DATA STEP with its whole bag of tricks. Then search
>> the list for similar questions, for example
>> http://thread.gmane.org/gmane.comp.lang.r.general/44332/focus=44343
>>
>>
>>> -----Original Message-----
>>> From: r-help-bounces at stat.math.ethz.ch
>>> [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Gerard Smits
>>> Sent: Sunday, January 21, 2007 2:22 PM
>>> To: r-help at stat.math.ethz.ch
>>> Subject: [R] sequential processing
>>>
>>> Like many others, I am new to R but old to SAS.
>>>
>>> Am I correct in understanding that R processes a data frame in a
>>> sequential ly?  This would imply that large input files could be
>>> read, without the need to load the entire file into memory.
>>> Related to the manner of reading a frame, I have been looking for the
>>> equivalent of SAS _n_ (I realize that I can use a variant of which to
>>> identify an index value) as well as  useful SAS features such as
>>> first., last., retain, etc.  Any help with this conversion
>>> appreciated.
>>>
>>> Thanks,
>>>
>>> Gerard Smits
>>>
>>> ______________________________________________
>>> R-help at stat.math.ethz.ch mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595



More information about the R-help mailing list