[R] Can't import this 4GB DATASET

Jan van der Laan rhelp at eoos.dds.nl
Fri May 4 21:01:20 CEST 2012


OK, not all, but most lines have the same length. Perhaps you could  
write the lines with a different line size to a separate file to have  
a closer look at those lines. Modifying the previous code (again not  
tested):

con <- file("dataset.txt", "rt")
out <- file("strangelines.txt", "wt")
# skip first 5 lines
lines <- readLines(con, n=5)
# read the rest in blocks of 100.000 lines
while (TRUE) {
    lines <- readLines(con, n=1E5)
    if (length(lines) == 0) break;
    strangelines <- lines[nchar(lines) != 97]
    writeLines(strangelines, con=out)
}
close(con)
close(out)

Jan



Quoting iliketurtles <isaacm200 at gmail.com>:

> Jan, thank you.
>
>> table(line_sizes)
> line_sizes
>        0        1       97      256
>     1430     2860 46869069     1430
>
> -----
> ----
>
> Isaac
> Research Assistant
> Quantitative Finance Faculty, UTS
> --
> View this message in context:   
> http://r.789695.n4.nabble.com/Can-t-import-this-4GB-DATASET-tp4607862p4608172.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list