[R] Running out of memory when importing SPSS files

Paul Bivand paul.bivand at gmail.com
Thu Feb 19 14:49:16 CET 2009


2009/2/19 Thomas Lumley <tlumley at u.washington.edu>:
> On Wed, 18 Feb 2009, Uwe Ligges wrote:
>
>> dobomode wrote:
>>>
>>> Hello R-help,
>>>
>>> I am trying to import a large dataset from SPSS into R. The SPSS file
>>> is in .SAV format and is about 1GB in size. I use read.spss to import
>>> the file and get an error saying that I have run out of memory. I am
>>> on a MAC OS X 10.5 system with 4GB of RAM. Monitoring the R process
>>> tells me that R runs out of memory when reaching about 3GB of RAM so I
>>> suppose the remaining 1GB is used up by the OS.
>>>
>>> Why would a 1GB SPSS file take up more than 3GB of memory in R?
>>
>> Because SPSS stores data in a compressed way?
>
> Or because R uses quite a lot more memory to read a data set than to store
> it. Either way, even if the data set eventually took up only 1Gb in R you
> still would probably not be able to work usefully with it on a 32-bit
> machine.
>
> You need to either use a 64-bit system or avoid loading the whole data set.
>  Unfortunately read.spss can't read the data selectively [something I'd like
> to fix, sometime], but if you had a .csv file you could read a subset of
> columns or rows using read.table.
>
> A better bet is likely to be putting the data set into a database (SQLite is
> easiest) and reading subsets of the data that way.  That's how I handle data
> sets of a few Gb (on a laptop with 1Gb memory).
>
>
>      -thomas
>
> Thomas Lumley                   Assoc. Professor, Biostatistics
> tlumley at u.washington.edu        University of Washington, Seattle
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

You could try using package memisc and only bring in the variables you
need to analyse.

see spss.system.file() and the additional subset() methods in memisc.

Paul Bivand

---------------------------------------------------------
Paul Bivand
Head of Analysis and Statistics
Inclusion

Inclusion has a launched a new website, please visit: www.cesi.org.uk




More information about the R-help mailing list