[R] compress data on read, decompress on write

Gregory Warnes gregory.warnes at mac.com
Thu Feb 28 23:54:10 CET 2008


You might look at storing the data using R's "raw" data type...

-G


On Feb 28, 2008, at 5:38PM , Ramon Diaz-Uriarte wrote:

> Dear Christos,
>
> Thanks for your reply. Actually, I should have been more careful with
> language: its not really a sparse matrix, but rather a ragged array
> that results from a more compact representation we though of for the
> hidden states in a Hidden Markov Model in many runs of MCMC. However,
> it might make sense for us to check sparseMatrix and see how its done
> there.
>
> Thanks,
>
> R
>
> On Thu, Feb 28, 2008 at 7:49 PM, Christos Hatzis
> <christos.hatzis at nuverabio.com> wrote:
>> Ramon,
>>
>>  If you are looking for a solution to your specific application  
>> (as opposed
>>  to a general compression/ decompression mechanism), it might be  
>> worth
>>  checking out the Matrix package, which has facilities for storing  
>> and
>>  manipulating sparse matrices.  The sparseMatrix class stores  
>> matrices in the
>>  triplet representation (i.e. only indices and values of the non-zero
>>  elements) and this affords great compression ratios, depending on  
>> the size
>>  and degree of sparseness of the matrix.
>>
>>  -Christos
>>
>>
>>
>>> -----Original Message-----
>>> From: r-help-bounces at r-project.org
>>> [mailto:r-help-bounces at r-project.org] On Behalf Of Ramon Diaz- 
>>> Uriarte
>>> Sent: Thursday, February 28, 2008 1:18 PM
>>> To: r-help at stat.math.ethz.ch
>>> Subject: [R] compress data on read, decompress on write
>>>
>>> Dear All,
>>>
>>> I'd like to be able to have R store (in a list component) a
>>> compressed data set, and then write it out uncompressed.
>>> gzcon and gzfile work in exactly the opposite direction. What
>>> would be a good way to handle this?
>>>
>>> Details:
>>> ----------
>>>
>>> We have a package that uses C; part of the C output is a
>>> large sparse matrix. This is never manipulated directly by R,
>>> but always by the C code. However, we need to store that data
>>> somewhere (inside an R
>>> object) for further calls to the functions in our package.
>>> We'd like to store that matrix as part of the R object (say,
>>> as an element of a list). Ideally, it would be stored in as
>>> compressed a way as possible.
>>> Then, when we need to use that information, it would be
>>> decompressed and passed to the C function.
>>>
>>> I guess one way to do it is to have C deal with the
>>> compression and uncompression (e.g., using zlib or the bzip2
>>> libraries) and then use readBin, etc, from R. But, if I can,
>>> I'd like to avoid our C code having to call zlib, etc, so as
>>> to make our package easily portable.
>>>
>>>
>>> Thanks,
>>>
>>> R.
>>>
>>> --
>>> Ramon Diaz-Uriarte
>>> Statistical Computing Team
>>> Structural Biology and Biocomputing Programme Spanish
>>> National Cancer Centre (CNIO) http://ligarto.org/rdiaz
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>>
>>
>>
>
>
>
> -- 
> Ramon Diaz-Uriarte
> Statistical Computing Team
> Structural Biology and Biocomputing Programme
> Spanish National Cancer Centre (CNIO)
> http://ligarto.org/rdiaz
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting- 
> guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list