[R] Reading large files in R

Andreas Hary u08adh at hotmail.com
Tue Aug 9 23:40:24 CEST 2005


Brief correction: it should read

>> van.call <- call('sqlQuery',con,query='select * from vandrivers;')

rather than

>> van.call <- sqlQuery(con,'select * from vandrivers;')

The latter statement would load the data into memory as usual.
Best wishes,

Andreas




----- Original Message ----- 
From: "Andreas Hary" <u08adh at hotmail.com>
To: "Berton Gunter" <gunter.berton at gene.com>; <ramasamy at cancer.org.uk>; 
"'Jean-Pierre Gattuso'" <gattuso at obs-vlfr.fr>
Cc: <r-help at stat.math.ethz.ch>
Sent: Monday, August 08, 2005 10:49 PM
Subject: Re: [R] Reading large files in R


> You can also use the RODBC package to hold the data in a database, say 
> MySQL
> and only import it when you do the modelling, e.g.
>
>> library(RODBC)
>> library(sspir)
>> con <- odbcConnect("MySQL Test")
>> data(vandrivers)
>> sqlSave(con,dat=vandrivers,append=FALSE)
>> rm(vandrivers)
>> gc()
>> van.call <- sqlQuery(con,'select * from vandrivers;')
>> vd <- ssm( y ~ tvar(1) + seatbelt + sumseason(time,12),
>>           time=time, family=poisson(link="log"),
>>           data=eval(van.call))
>> vd$ss$phi["(Intercept)"] <- exp(- 2*3.703307 )
>> vd$ss$C0 <- diag(13)*1000
>> vd.res <- kfs(vd)
>> gc()
>
> In this case I have first saved the vandriver data in 'MySQL Test', but 
> one
> can obviously write the data directly to the database. Since the data is 
> not
> held in memory I find that I can do much larger computations than is
> otherwise possible. The downside is of course that computations take a bit
> longer.
> Best wishes,
>
> Andreas
>
> =====================
> Andreas D Hary
> Email:    u08adh at hotmail.com
> Mobile:   07906860987
> Phone:   02076554940
>
>
>
>
> ----- Original Message ----- 
> From: "Berton Gunter" <gunter.berton at gene.com>
> To: <ramasamy at cancer.org.uk>; "'Jean-Pierre Gattuso'" 
> <gattuso at obs-vlfr.fr>
> Cc: <r-help at stat.math.ethz.ch>
> Sent: Monday, August 08, 2005 8:35 PM
> Subject: Re: [R] Reading large files in R
>
>
>> ... and it is likely that even if you did have enough memory (several
>> times
>> the size of the data are generally needed) it would take a very long 
>> time.
>>
>> If you do have enough memory and the data are all of one type -- numeric
>> here -- you're better off treating it as a matrix rather than converting
>> it
>> to a data frame.
>>
>> -- Bert Gunter
>> Genentech Non-Clinical Statistics
>> South San Francisco, CA
>>
>> "The business of the statistician is to catalyze the scientific learning
>> process."  - George E. P. Box
>>
>>
>>
>>> -----Original Message-----
>>> From: r-help-bounces at stat.math.ethz.ch
>>> [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of
>>> Adaikalavan Ramasamy
>>> Sent: Monday, August 08, 2005 12:02 PM
>>> To: Jean-Pierre Gattuso
>>> Cc: r-help at stat.math.ethz.ch
>>> Subject: Re: [R] Reading large files in R
>>>
>>> >From Note section of help("read.delim") :
>>>
>>>      'read.table' is not the right tool for reading large matrices,
>>>      especially those with many columns: it is designed to read _data
>>>      frames_ which may have columns of very different classes. Use
>>>      'scan' instead.
>>>
>>> So I am not sure why you used 'scan', then converted it to a
>>> data frame.
>>>
>>> 1) Can provide an sample of the data that you are trying to read in.
>>> 2) How much memory does your machine has ?
>>> 3) Try reading in the first few lines using the nmax argument in scan.
>>>
>>> Regards, Adai
>>>
>>>
>>>
>>> On Mon, 2005-08-08 at 12:50 -0600, Jean-Pierre Gattuso wrote:
>>> > Dear R-listers:
>>> >
>>> > I am trying to work with a big (262 Mb) file but apparently
>>> reach a
>>> > memory limit using R on a MacOSX as well as on a unix machine.
>>> >
>>> > This is the script:
>>> >
>>> >  > type=list(a=0,b=0,c=0)
>>> >  > tmp <- scan(file="coastal_gebco_sandS_blend.txt", what=type,
>>> > sep="\t", quote="\"", dec=".", skip=1, na.strings="-99",
>>> nmax=13669628)
>>> > Read 13669627 records
>>> >  > gebco <- data.frame(tmp)
>>> > Error: cannot allocate vector of size 106793 Kb
>>> >
>>> >
>>> > Even tmp does not seem right:
>>> >
>>> >  > summary(tmp)
>>> > Error: recursive default argument reference
>>> >
>>> >
>>> > Do you have any suggestion?
>>> >
>>> > Thanks,
>>> > Jean-Pierre Gattuso
>>> >
>>> > ______________________________________________
>>> > R-help at stat.math.ethz.ch mailing list
>>> > https://stat.ethz.ch/mailman/listinfo/r-help
>>> > PLEASE do read the posting guide!
>>> http://www.R-project.org/posting-guide.html
>>> >
>>>
>>> ______________________________________________
>>> R-help at stat.math.ethz.ch mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide!
>>> http://www.R-project.org/posting-guide.html
>>>
>>
>> ______________________________________________
>> R-help at stat.math.ethz.ch mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide!
>> http://www.R-project.org/posting-guide.html
>>
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
>




More information about the R-help mailing list