[R] Reading a file line by line - separating lines VS separating columns

Thu Mar 19 02:28:44 CET 2009

You can do something like this using connections and read in a set of
lines and saving the results in bigmemory, or in this case a 'save'
image:

zz <- file("ex.data", "w") # open an output file
for (i in 1:10000)cat( "1\t2\t3\t4\t5\t6\t7\t8\t9\t10\t\t555\t\t",
file = zz, sep ="\n")
close(zz)

# read in the data 876 lines at a time and write out an image
zz <- file("ex.data", "r")
fileNo <- 1
repeat{
    gotError <- 1   # set to 2 if there is an error
    # catch the error if not more data
    tryCatch(input <- read.table(zz, nrows=876, sep='\t'),
error=function(x) gotError <<- 2)
    if (gotError == 2) break
    # save the intermediate data
    save(input, file=sprintf("file%03d.RDData", fileNo))
    fileNo <- fileNo + 1
}
close(zz)

On Wed, Mar 18, 2009 at 7:17 PM, Tal Galili <tal.galili at gmail.com> wrote:
> Hello all.
>
> I wish to read a large data set into R.  My current issue is in getting the
> data so that R would be able to access it.  Using read.table won't work
> since the data is over 1GB in size (and I am using windows XP), so my plan
> was to read the file chunk by chunk and each time move it into bigmemory
> (I'll play with that when the time will come, maybe ff is better ?!).
>
> I encountered a problem with separating lines VS separating columns, to
> which I found a solution but it doesn't feel to be a smart solution, any
> ideas or help of how to improve this would be welcomed.
>
>
>
> # sample code:
>
> # creating a simple file zz <- file("ex.data", "w") # open an output file
> connection cat( "1\t2\t3\t4\t5\t6\t7\t8\t9\t10\t\t555\t\t", file = zz, sep =
> "\n") cat( "1\t2\t3\t4\t5\t6\t7\t8\t9\t10\t\t555\t\t", file = zz, sep =
> "\n") cat( "1\t2\t3\t4\t5\t6\t7\t8\t9\t10\t\t555\t\t", file = zz, sep =
> "\n") (temp.file = scan("ex.data", what = "", sep = "\n")) # here we can
> limit the amount of rows we want to use and start from a specific row using
> skip # or: #(aa = readLines("ex.data")) str(aa) # we get a vector of
> character new.df <- NULL # we go through the vector to split the columns
> for(i in 1:length(aa)) { new.df <- rbind(new.df
> ,unlist(strsplit(temp.file[i], "\t"))) } new.df # or maybe
> apply(as.data.frame(temp.file), 1, function(b) unlist(strsplit(b, "\t"))) #
> but this transposes the matrix
>
>
> Thanks,
> Tal
>
>
> --
> ----------------------------------------------
>
>
> My contact information:
> Tal Galili
> Phone number: 972-50-3373767
> FaceBook: Tal Galili
> My Blogs:
> http://www.r-statistics.com/
> http://www.talgalili.com
> http://www.biostatistics.co.il
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?