[R] Re: large survey data

Mark Myatt mark at myatt.demon.co.uk
Mon Jul 16 12:07:56 CEST 2001


>Micha? Bojanowski <bojanr at wp.pl> writes:
>
> Recently I came across a problem. I have to analyze a large survey 
> data - something about 600 columns and 10000 rows (tab-delimited file 
> with names in the header). I was able do import the data into an 
> object, but there is no more memory left.
> 
> Is there a way to import the data column by column? I have to analyze 
> the whole data, but only two variables at a time.

Something like this:

         a <- scan("file.dat",
                   skip = 1,
                   what = list(0,0,0,0,0),
                   flush = T)[c(2,5)]

         a <- cbind(unlist(a[1]), unlist(a[2]))

might do the trick (this does columns 2 and 5 ... change the index
'[c(2,5)]' to get other columns). The option to scan() are 'skip = 1'
drops the first line of the file, 'what' is a list specifying variable
types (I specify 5 numeric columns ... you need to specify up until your
last variable), 'flush' speeds the whole thing up and saves memory by
not reading more of the line than specified in 'what'.

The cbind() just converts the list returned by scan() to a matrix. You
could make a data.frame using:

         a <- as.data.frame(cbind(unlist(a[1]), unlist(a[2])))

I hope that helps.


--
Mark Myatt


-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._



More information about the R-help mailing list