[R] Large Dataset

Gábor Csárdi csardi at rmki.kfki.hu
Tue Jan 6 18:20:33 CET 2009


For the mean, min, max and standard deviance (deviation I suppose) you
don't need to store all data in the memory, you can calculate them
incrementally. Read the file line by line (if it is a text file).

G.

On Tue, Jan 6, 2009 at 6:10 PM, Edwin Sendjaja <edwin7 at web.de> wrote:
> Hi Ben,
>
> Using colClasses doensnt improve the performace much.
>
> With the data, I will calculate the mean, min, max, and standard deviance.
>
> I have also failed to import the data in a Mysql Database. I dont have much
> knowledge in Mysql.
>
> Edwin
>
>
>
>> Edwin Sendjaja <edwin7 <at> web.de> writes:
>> > Hi Simon,
>> >
>> > My RAM is only 3.2 GB (actually it should be 4 GB, but my Motherboard
>> > doesnt support it.
>> >
>> > R use almost of all my RAM and half of my swap. I think memory.limit will
>> > not solve my problem.  It seems that I need  RAM.
>> >
>> > Unfortunately, I can't buy more RAM.
>> >
>> > Why R is slow reading big data set?
>> >
>> > Edwin
>>
>>   Start with FAQ 7.28 ,
>> http://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-is-read_002etable_0028_002
>>9-so-inefficient_003f
>>
>>   However, I think you're going to have much bigger problems
>> if you have a 3.1G data set and a total of 3.2G of RAM: what do
>> you expect to be able to do with this data set once you've read
>> it in?  Have you considered storing it in a database and accessing
>> just the bits you need at any one time?
>>
>>   Ben Bolker
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html and provide commented, minimal,
>> self-contained, reproducible code.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Gabor Csardi <Gabor.Csardi at unil.ch>     UNIL DGM




More information about the R-help mailing list