[R] reading very large files

juli g. pausas pausas at gmail.com
Sat Feb 3 19:06:01 CET 2007


Thank so much for your help and comments.
The approach proposed by Jim Holtman was the simplest and fastest. The
approach by Marc Schwartz also worked (after a very small
modification).

It is clear that a good knowledge of R save a lot of time!! I've been
able to do in few minutes a process that was only 1/4th done after 25
h!

Many thanks

Juli


On 02/02/07, juli g. pausas <pausas at gmail.com> wrote:
> Hi all,
> I have a large file (1.8 GB) with 900,000 lines that I would like to read.
> Each line is a string characters. Specifically I would like to randomly
> select 3000 lines. For smaller files, what I'm doing is:
>
> trs <- scan("myfile", what= character(), sep = "\n")
>  trs<- trs[sample(length(trs), 3000)]
>
> And this works OK; however my computer seems not able to handle the 1.8 G
> file.
> I thought of an alternative way that not require to read the whole file:
>
> sel <- sample(1:900000, 3000)
> for (i in 1:3000)  {
> un <- scan("myfile", what= character(), sep = "\n", skip=sel[i], nlines=1)
>  write(un, "myfile_short", append=TRUE)
> }
>
> This works on my computer; however it is extremely slow; it read one line
> each time. It is been running for 25 hours and I think it has done less than
> half of the file (Yes, probably I do not have a very good computer and I'm
> working under Windows ...).
> So my question is: do you know any other faster way to do this?
> Thanks in advance
>
> Juli
>
> --
>  http://www.ceam.es/pausas
>


-- 
http://www.ceam.es/pausas



More information about the R-help mailing list