[Rd] model.matrix metadata

Patrick O'Reilly patrick.a.oreilly at gmail.com
Fri Oct 17 13:34:28 CEST 2014


Hi,

As far as I am aware, the model.matrix function does not return
perfect metadata on what each column of the model matrix "means".

The columns are named (e.g. age:genderM), but encoding the metadata as
strings can result in ambiguity. For example, the dummy variables
created when the factors var0 = 0 and var = 00 both are named var00.
Additionally, if a level of a factor variable contains a colon, this
could be confused for an interaction.

While a human can generally work out the meaning of each column
somewhat manually, I am interested in achieving this programmatically.

My solution is to edit the modelmatrix function in
/src/library/stats/src/model.c to additionally return the following:

intrcept
factors
contr1
contr2
count

With the availability of these in R it is possible to determine the
precise meaning of each column without the error-prone parsing of
strings. I have attached my edit: see lines 753-764.

I am seeking advice on this approach. Am I missing a simpler way of
achieving this (which perhaps avoids rebuilding R)?

Since model.matrix is used in so many modeling functions this would be
very helpful for the programmatic interpretation of model output. A
search on the Internet suggests there are other R users who would
welcome such functionality.

Many thanks in advance,

Pat O'Reilly
-------------- next part --------------
A non-text attachment was scrubbed...
Name: model.c
Type: text/x-csrc
Size: 56318 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/r-devel/attachments/20141017/5ba1e6aa/attachment.bin>


More information about the R-devel mailing list