[R] Read big data (>3G ) methods ?

Jan van der Laan rhelp at eoos.dds.nl
Sat Apr 27 17:51:27 CEST 2013


I believe it was already mentioned, but I can recommend the LaF package 
(not completely impartial being the maintainer of LaF ;-)

However, the speed differences between packages will not be very large. 
Eventually all packages will have to read in 6 GB of data and convert 
the text data to numeric data. So the tricks are to
1 only read in columns that you need
2 only read in lines that you need
3 and if you need to read the data more than once convert it to some 
binary format first (RDS, ff, sqlite, bigmemory, ...). Most packages 
have routines to convert CSV files to the binary format.

With all of the above LaF helps. ffbase contains a routine laf_to_ffdf 
to convert to to ff format.


HTH,

Jan



On 04/27/2013 04:34 AM, Kevin Hao wrote:
> Thank you very much.
>
> More and more methods are coming. That sounds great!
>
>
> Thanks,
>
> kevin
>
>
>
> On Fri, Apr 26, 2013 at 7:51 PM, Duncan Murdoch <murdoch.duncan at gmail.com>wrote:
>
>> On 13-04-26 3:00 PM, Kevin Hao wrote:
>>
>>> Hi Ye,
>>>
>>> Thanks.
>>>
>>> That is a good method. have any other methods instead of using database?
>>>
>>
>> If you know the format of the file, you can probably write something in C
>> (or other language) that is faster than R.  Convert your .csv file to a
>> nice binary format, and R will read it in no time at all.
>>
>> If writing it in C is hard, then R is probably a better use of your time.
>>   Read the file once, write it out using saveRDS(), and read it in using
>> readRDS() after that.
>>
>> In either case, the secret is to do the conversion from ugly character
>> encoded numbers to beautiful binary numbers just once.
>>
>> Duncan Murdoch
>>
>>
>>
>>> kevin
>>>
>>>
>>> On Fri, Apr 26, 2013 at 1:58 PM, Ye Lin <yelin at lbl.gov> wrote:
>>>
>>>   Have you think of build a database then then let R read it thru that db
>>>> instead of your desktop?
>>>>
>>>>
>>>> On Fri, Apr 26, 2013 at 8:09 AM, Kevin Hao <rfans4chemo at gmail.com>
>>>> wrote:
>>>>
>>>>   Hi all scientists,
>>>>>
>>>>> Recently, I am dealing with big data ( >3G  txt or csv format ) in my
>>>>> desktop (windows 7 - 64 bit version), but I can not read them faster,
>>>>> thought I search from internet. [define colClasses for read.table,
>>>>> cobycol
>>>>> and limma packages I have use them, but it is not so fast].
>>>>>
>>>>> Could you share your methods to read big data to R faster?
>>>>>
>>>>> Though this is an odd question, but we need it really.
>>>>>
>>>>> Any suggest appreciates.
>>>>>
>>>>> Thank you very much.
>>>>>
>>>>>
>>>>> kevin
>>>>>
>>>>>           [[alternative HTML version deleted]]
>>>>>
>>>>> ______________________________**________________
>>>>> R-help at r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/**listinfo/r-help<https://stat.ethz.ch/mailman/listinfo/r-help>
>>>>> PLEASE do read the posting guide
>>>>> http://www.R-project.org/**posting-guide.html<http://www.R-project.org/posting-guide.html>
>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>
>>>>>
>>>>
>>>>
>>>          [[alternative HTML version deleted]]
>>>
>>> ______________________________**________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/**listinfo/r-help<https://stat.ethz.ch/mailman/listinfo/r-help>
>>> PLEASE do read the posting guide http://www.R-project.org/**
>>> posting-guide.html <http://www.R-project.org/posting-guide.html>
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>>
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list