[R] Sparse dataframes?

Karl Ove Hufthammer karl at huftis.org
Tue Jan 15 21:06:15 CET 2013


andrewH skreiv:

> Is there a data frame analog to sparse matrices? I am working with a panel
> data set that has a large number of variables that are redefined
> repeatedly or exist for only a few years (out of 48).  In my current
> structure, where variables are columns and rows are years, more than 90
> percent of the cells and more than 3/4 of the total size of my file are
> NAs.
> 
> I am wondering if there is an alternate file specification currently
> available that still allows numeric, character and factor data to be
> stored. Besides just using a database.

How about storing the data in a ‘long’ format, like you get when you
apply melt() (with na.rm=TRUE) from the ‘reshape2’ package to your data 
frame? Parts of the data frame (the ID part) will be repeated on each row, 
which may make the data take up more space, but no rows are stored for NA 
cells, so for somewhat sparse data it will be a win. It also makes it very 
easy to reshape and analyse the data.

Here’s an introduction (to the older ‘reshape’ package, but ‘reshape2’ is 
very similar): http://www.jstatsoft.org/v21/i12

You might also be interested in this paper on ‘tidy’ data:
http://vita.had.co.nz/papers/tidy-data.pdf

-- 
Karl Ove Hufthammer
E-mail: karl at huftis.org
Jabber: huftis at jabber.no



More information about the R-help mailing list