[Rd] RE: Variable lables (was Re: [R] Reading SAS version 8 data into

Warnes, Gregory R gregory_r_warnes@groton.pfizer.com
Fri, 24 Aug 2001 14:18:24 -0400


[Moved from R-help]

> From: fharrell@virginia.edu [mailto:fharrell@virginia.edu]
> I store variable labels as "label" attributes of vectors
> and use then in various plotting functions as well as the
> describe() function.

I would like to see general support for label attributes in the R plotting
and modeling functions.  One possible way of implementing this is to create
a replacement for the standard "deparse(substitute(blah))" idiom. This
function, getlabel(), checks for a label attribute and returns that if
present. Otherwise it returns the variable's name as a string.

Here's some code I've put together:

label  <-  function(x) attr(x,"label")

"label<-" <-  function(x, value )
  {
    m  <-  match.call()
    m[[1]]  <- as.name("attr<-")
    m$value  <- NULL
    m$which  <- "label"
    m$value  <- value
    eval(m)
  }

getlabel <- function(x)
  {
    tmp <- attr(x,"label")
    if(is.null(tmp) || tmp=="")
      {
        m  <- match.call()
        m[[1]] <- as.name('substitute')
        tmp <- deparse(eval(m,envir=parent.frame()))
      }
    return(tmp)
  }

I've done some testing, and getlabel seems to work fine as a substitute for
"deparse(subsitute(x))" in the plot commands.

There are a couple of problems.  First, attributes are carried along in
sometime unexpected ways.  For example, attributes are carried along by all
of the arethmetic operations I tried:
   > x <- rnorm(1)
   > label(x) <- "x label"
   > 
   > sqrt(x)
    [1] 0.8888801
   attr(,"label")
   [1] "x label"
   > x+1
    [1] 1.8888801
   attr(,"label")
   [1] "x label"
Ideally, performing an operation the creates a new variable should mask off
the label attribute (what about other attributes?).  I recognize that this
would require changes to R.  Would this be a big task?

Second, unless one bounds the length of the labels, it can get pretty messy
to use them in some places, (eg the coefficients table reported by
print.summary).  I can see a couple of solutions for this problem.  A)
Truncate labels when necessary.   B) Have 2 attributes--One short 'label'
that has a fixed length (say 30 characters), and one long 'description' that
can has no length limit.  C) Continue to use the variable name given in the
call for places where length is a problem, but show a translation between
the variable name and the label somewhere else as part of the output.

Except for the problem of the label attribute getting 'carried along' when
it is not desirable, I think that it would be straightforward and 'backwards
compatible' to add general support for variable labels.  

I am willing to submit patches for functions that I regularly use.  Would
others be willing to contribute?  Would the patches be accepted?

-Greg


LEGAL NOTICE
Unless expressly stated otherwise, this message is confidential and may be privileged. It is intended for the addressee(s) only. Access to this E-mail by anyone else is unauthorized. If you are not an addressee, any disclosure or copying of the contents of this E-mail or any action taken (or not taken) in reliance on it is unauthorized and may be unlawful. If you are not an addressee, please inform the sender immediately.
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._