[R] Handling 8GB .txt file in R?
R. Michael Weylandt
michael.weylandt at gmail.com
Sat Mar 24 14:20:54 CET 2012
Despair not! Malcom Gladwell would say you are 1/10 of the way to
becoming the next MozaRt!
You need to say how your data set is designed. Your problem with ff
seems to be that the lines are not of constant length: if they aren't
of a consistent CSV format, I wouldn't be surprised if a CSV splitter
had problems with them as well. If you are on a Unix-alike system,
this (the splitting) could be pretty easily done with awk/sed/perl,
but you need to define your problem much more clearly. If things
aren't nicely structured, you will almost certainly benefit from doing
a little bit of data preparation work with Unix utilities before
loading into R.
On Sat, Mar 24, 2012 at 4:08 AM, iliketurtles <isaacm200 at gmail.com> wrote:
> I am mediocre at R, maybe 1000 hours experience, but I received an 8GB
> dataset and I don't know what to do with it. I have to do extensive analysis
> over it for my Honours thesis.
> I can't even import it. I've tried;
> - Splitting it up using the free csv-splitter-1.1.zip that seems to be
> working for everyone else (it doesn't work for me, it just outputs 1 single
> - Splitting it with Text Splitter doesn't work because you have to load it
> into memory first.
> - Importing using BigMemory's big.matrix(), however my computer just
> - Importing using ff's read.table.ffdf(), however I get the error message
> " in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, :
> line 5 did not have 9 elements"
> Thanks for any ideas and assistance.
> Can R do this on a computer with 4 GB of memory and a dual core i5xx ?
> Research Assistant
> Quantitative Finance Faculty, UTS
> View this message in context: http://r.789695.n4.nabble.com/Handling-8GB-txt-file-in-R-tp4500971p4500971.html
> Sent from the R help mailing list archive at Nabble.com.
> R-help at r-project.org mailing list
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help