[R] Reading parts of data files

Tue Nov 23 15:57:57 CET 2010

On Tue, Nov 23, 2010 at 6:05 AM, fbielejec <fbielejec at gmail.com> wrote:
> Dear,
>
> I'm doing analysis where I need to work on relatively large (50-60 MB)
> text files, though I'm really interested only in parts with binary
> variables (named indicators1, indicators2, ... etc.)
>
> Every text file contains other numeric columns, but not always the same
> and not always in the same order - therefore I would rather need a
> method connecting to file and reading only colums with respect to name
> pattern (ie indicators + number). That should speed things up (now I
> have to clean data by hand) but also leave less memory footprint. Could
> You point me towards sth?
>

This is easy using read.csv.sql:

library(sqldf)

# create test file
write.table(anscombe, "anscombe.csv", sep = ",", quote = FALSE,
row.names = FALSE)

# read it back but only indicated columns
read.csv.sql("anscombe.csv", sql = "select x1, x2, y1, y2 from file")

See ?read.csv.sql and also sqldf home page at http://sqldf.googlecode.com

-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com