[R] read.table() 1Gb text dataframe

EMISHIA PERSIOUS emishiapersious57 at yahoo.com
Fri Sep 19 05:40:20 CEST 2014


r code for the packages cstmr ,gRain,gRc,gRim,gRbase using probability models,


On Friday, September 19, 2014 7:05 AM, Henrik Bengtsson <hb at biostat.ucsf.edu> wrote:
 


As a start, make sure you specify the 'colClasses' argument.  BTW,
using that you can even go to the extreme and read one column at the
time, if it comes down to that.

To read a 10% subset of the rows, you can use R.filesets as:

library(R.filesets)
db <- TabularTextFile(pathname)
n <- nbrOfRows(db)
data <- readDataFrame(db, rows=seq(from=1, to=n, length.out=0.10*n))

It is also useful to specify 'colClasses' here. In addition to
specifying them ordered by column, as for read.table(), you also
specify them by column names (or regular expressions of the column
names), e.g.

data <- readDataFrame(db, colClasses=c("*"="NULL", "(x|y)"="integer",
outcome="numeric", "id"="character"), rows=seq(from=1, to=n,
length.out=0.10*n))

That 'colClasses' specifies that the default is drop all columns, read
columns 'x' and 'y' as integers, and so on.

BTW, if you know 'n' upfront you can skip the setup of TabularTextFile
and just do:

data <- readDataFrame(pathname, rows=seq(from=1, to=n, length.out=0.10*n))


Hope this helps

Henrik

On Thu, Sep 18, 2014 at 4:48 PM, Stephen HK Wong <honkit at stanford.edu> wrote:
> Dear All,
>
> I have a table of 4 columns and many millions rows separated by tab-delimited. I don't have enough memory to read.table in that 1 Gb file. And actually I have 12 text files like that. Is there a way that I can just randomly read.table() in 10% of rows ? I was able to do that using colbycol package, but it is not not available. Many thanks!!
>
>
>
> Stephen HK Wong
> Stanford, California 94305-5324
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
	[[alternative HTML version deleted]]



More information about the R-help mailing list