[R] Large Stata file Import in R

Xavier xfim.ll at gmail.com
Tue Jul 7 11:40:25 CEST 2009


Thomas Lumley vas escriure el dia dt, 30 jun 2009:

> On Tue, 30 Jun 2009, Xavier wrote:
>
>> saurav pathak vas escriure el dia dl, 29 jun 2009:
>>
>>> Hi
>>>
>>> I am using Stata 10 and I need to import a data set in stata 10 to R, I 
>>> have
>>> saved the dataset in lower versions of Stata as well by using saveold
>>> command in Stata.
>>>
>>> My RAM is 4gb and the stata file is 600MB, I am getting an error message
>>> which says :
>>>
>>> "Error: cannot allocate vector of size 3.4 Mb
>>> In addition: There were 50 or more warnings (use warnings() to see the 
>>> first
>>> 50)"
>>>
>>> Thus far I have already tried the following
>>
>> Maybe it does not adress the R problem that you are asking for, but you 
>> can
>> try to "compress" the stata file prior to save it. And maybe the size of
>> the file will decrease.
>>
>
> This can't possibly help.  The problem is that *R* is running out of 
> memory, and storing the data elements in less space *on disk* won't help 
> with the space used in memory.  Stata's -compress- option just chooses 
> smaller data types, eg, byte instead of integer.

I have done a small test and it seems that it can help.

I have a big dataset in stata (big) to which I apply the "compress" command
(in Stata), getting a small file. Those are the sizes in stata:
-----8<---------------
# original data size in stata
Contains data from G:\tmp\example-big.dta
  obs:        52,547                          Written by R.              
 vars:            54                          
 size:    21,807,005 (96.4% of memory free)


# data size once "compress" has been used
Contains data from example-small.dta
  obs:        52,547                          Written by R.              
 vars:            54                          3 Jul 2009 15:27
 size:    17,918,527 (97.1% of memory free)

-----8<---------------

And when loaded into R:
-----8<---------------
> library(foreign)
> big <- read.dta("example-big.dta") 
> small <- read.dta("example-small.dta")
> object.size(big)
20819600 bytes
> object.size(small)
19558520 bytes
-----8<---------------

Maybe the difference once objects are stored in memory is not as big as it
is when stored in disk, but it seems a good idea to compress data in stata
prior to load it into R, if memory is a problem.

-- 
-  Xavier  -




More information about the R-help mailing list