[R] Read big data (>3G ) methods ?

Duncan Murdoch murdoch.duncan at gmail.com
Sat Apr 27 01:51:14 CEST 2013


On 13-04-26 3:00 PM, Kevin Hao wrote:
> Hi Ye,
>
> Thanks.
>
> That is a good method. have any other methods instead of using database?

If you know the format of the file, you can probably write something in 
C (or other language) that is faster than R.  Convert your .csv file to 
a nice binary format, and R will read it in no time at all.

If writing it in C is hard, then R is probably a better use of your 
time.  Read the file once, write it out using saveRDS(), and read it in 
using readRDS() after that.

In either case, the secret is to do the conversion from ugly character 
encoded numbers to beautiful binary numbers just once.

Duncan Murdoch

>
> kevin
>
>
> On Fri, Apr 26, 2013 at 1:58 PM, Ye Lin <yelin at lbl.gov> wrote:
>
>> Have you think of build a database then then let R read it thru that db
>> instead of your desktop?
>>
>>
>> On Fri, Apr 26, 2013 at 8:09 AM, Kevin Hao <rfans4chemo at gmail.com> wrote:
>>
>>> Hi all scientists,
>>>
>>> Recently, I am dealing with big data ( >3G  txt or csv format ) in my
>>> desktop (windows 7 - 64 bit version), but I can not read them faster,
>>> thought I search from internet. [define colClasses for read.table, cobycol
>>> and limma packages I have use them, but it is not so fast].
>>>
>>> Could you share your methods to read big data to R faster?
>>>
>>> Though this is an odd question, but we need it really.
>>>
>>> Any suggest appreciates.
>>>
>>> Thank you very much.
>>>
>>>
>>> kevin
>>>
>>>          [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>>
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list