[Rd] how to manipulate dput output format

Dirk Eddelbuettel edd at debian.org
Thu Jun 21 04:19:14 CEST 2012


On 20 June 2012 at 10:33, Simon Urbanek wrote:
| 
| On Jun 19, 2012, at 11:04 AM, andre zege wrote:
| 
| > I am reading into Java dput output for a matrix, more specifically for a
| > file backed big-matrix. I basically need to lift dimnames for a matrix from
| > dput output. It's no big deal, but the code is very 'hackish' due to the
| > need to get rid of quotes, endlines, parenthesis, etc. I was wondering if i
| > could manipulate to an extent dput output with some options that define it,
| > for example, get rid of quoting each element  in matirx dimnames somehow.
| > Another great thing wiould be to make dput dump rownames and colnames on
| > two separate lines, but i don't think it's possible. To give a specific
| > example, instead of dput output like
| > 
| > 
| > **new("big.matrix.descriptor"
| >    , description = structure(list(sharedType = "FileBacked", filename =
| > "res", totalRows = 1528,
| >    totalCols = 53040, rowOffset = c(0, 1528), colOffset = c(0,
| >    53040), nrow = 1528, ncol = 53040, rowNames = c("A", "AA",
| >    "RNT.A", "ADVA", "AAPL", "AAS", "ABFS", "ABM", "ABT", "ACI",
| >    .......
| > 
| > I'd prefer ideally to have it in the form where rownames and colnames don't
| > have quotes and newlines and if possible are on separate lines
| > 
| > new("big.matrix.descriptor"
| >    , description = structure(list(sharedType = "FileBacked", filename =
| > "res", totalRows = 1528,
| >    totalCols = 53040, rowOffset = c(0, 1528), colOffset = c(0,
| >    53040), nrow = 1528, ncol = 53040,
| > rowNames = c(A, AA, RNT.A, ADVA, AAPL, AAS, ABFS, ABM, ABT, ... )
| > colNames = c(...)
| > 
| 
| dput() is intended to be parsed by R so the above is not possible without massaging the output. But why in the would would you use dput() for something that you want to read in Java? Why don't you use a format that Java can read easily - such as JSON?

Or even use something designed for fast, large scale data serialization such
as Google Protocol Buffers. 

You get code generated for Java from using the Google library / binaries for
it, and the R package RProtoBuf will provide the other side.  

Dirk

-- 
Dirk Eddelbuettel | edd at debian.org | http://dirk.eddelbuettel.com



More information about the R-devel mailing list