[R] Running out of memory when importing SPSS files

Paul Bivand paul.bivand at gmail.com
Thu Feb 19 14:49:16 CET 2009

2009/2/19 Thomas Lumley <tlumley at u.washington.edu>:
> On Wed, 18 Feb 2009, Uwe Ligges wrote:
>> dobomode wrote:
>>> Hello R-help,
>>> I am trying to import a large dataset from SPSS into R. The SPSS file
>>> is in .SAV format and is about 1GB in size. I use read.spss to import
>>> the file and get an error saying that I have run out of memory. I am
>>> on a MAC OS X 10.5 system with 4GB of RAM. Monitoring the R process
>>> tells me that R runs out of memory when reaching about 3GB of RAM so I
>>> suppose the remaining 1GB is used up by the OS.
>>> Why would a 1GB SPSS file take up more than 3GB of memory in R?
>> Because SPSS stores data in a compressed way?
> Or because R uses quite a lot more memory to read a data set than to store
> it. Either way, even if the data set eventually took up only 1Gb in R you
> still would probably not be able to work usefully with it on a 32-bit
> machine.
> You need to either use a 64-bit system or avoid loading the whole data set.
>  Unfortunately read.spss can't read the data selectively [something I'd like
> to fix, sometime], but if you had a .csv file you could read a subset of
> columns or rows using read.table.
> A better bet is likely to be putting the data set into a database (SQLite is
> easiest) and reading subsets of the data that way.  That's how I handle data
> sets of a few Gb (on a laptop with 1Gb memory).
>      -thomas
> Thomas Lumley                   Assoc. Professor, Biostatistics
> tlumley at u.washington.edu        University of Washington, Seattle
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

You could try using package memisc and only bring in the variables you
need to analyse.

see spss.system.file() and the additional subset() methods in memisc.

Paul Bivand

Paul Bivand
Head of Analysis and Statistics

Inclusion has a launched a new website, please visit: www.cesi.org.uk

More information about the R-help mailing list