[Rd] Re: Variable lables (was Re: [R] Reading SAS version 8 data into

fharrell@virginia.edu fharrell@virginia.edu
Sat, 25 Aug 2001 09:42:34 -0400


Dear Greg,

I too would like to see labels be more a part of R.
In Hmisc I allow labels to be any length but plotting
and table making functions have options to abbreviate()
them or use variable names instead of labels.

I think your code is more complex that is really needed.

The problem with defaulting to deparse(...) is that
multiple function pass-throughs return the wrong result:

> f <- function(w)getlabel(w)
> g <- function(z)f(z)
> g(y)
[1] "z"

So I don't see a large role for the deparse(...) method.

The Hmisc library already defines label<- so if you
are willing to use another name for your version that
would prevent confusion from users of Hmisc.

The problem of labels being retained after you do
arithmetic on the variable is a real one, and one
I've put up with for a long time with S-Plus.  It would
be nice if R could prevent that but that is getting tricky.
What I've wanted more generally is the ability for the
user to specify a vector of attribute names in options()
that would be preserved upon subsetting.  That way I
wouldn't have to go to trouble to write local versions
of [.factor, etc. that carry the 'label' attribute.
Im my usage, 'label's are always logically carried
forward for subsetting.

Frank


"Warnes, Gregory R" wrote:
> 
> [Moved from R-help]
> 
> > From: fharrell@virginia.edu [mailto:fharrell@virginia.edu]
> > I store variable labels as "label" attributes of vectors
> > and use then in various plotting functions as well as the
> > describe() function.
> 
> I would like to see general support for label attributes in the R plotting
> and modeling functions.  One possible way of implementing this is to create
> a replacement for the standard "deparse(substitute(blah))" idiom. This
> function, getlabel(), checks for a label attribute and returns that if
> present. Otherwise it returns the variable's name as a string.
> 
> Here's some code I've put together:
> 
> label  <-  function(x) attr(x,"label")
> 
> "label<-" <-  function(x, value )
>   {
>     m  <-  match.call()
>     m[[1]]  <- as.name("attr<-")
>     m$value  <- NULL
>     m$which  <- "label"
>     m$value  <- value
>     eval(m)
>   }
> 
> getlabel <- function(x)
>   {
>     tmp <- attr(x,"label")
>     if(is.null(tmp) || tmp=="")
>       {
>         m  <- match.call()
>         m[[1]] <- as.name('substitute')
>         tmp <- deparse(eval(m,envir=parent.frame()))
>       }
>     return(tmp)
>   }
> 
> I've done some testing, and getlabel seems to work fine as a substitute for
> "deparse(subsitute(x))" in the plot commands.
> 
> There are a couple of problems.  First, attributes are carried along in
> sometime unexpected ways.  For example, attributes are carried along by all
> of the arethmetic operations I tried:
>    > x <- rnorm(1)
>    > label(x) <- "x label"
>    >
>    > sqrt(x)
>     [1] 0.8888801
>    attr(,"label")
>    [1] "x label"
>    > x+1
>     [1] 1.8888801
>    attr(,"label")
>    [1] "x label"
> Ideally, performing an operation the creates a new variable should mask off
> the label attribute (what about other attributes?).  I recognize that this
> would require changes to R.  Would this be a big task?
> 
> Second, unless one bounds the length of the labels, it can get pretty messy
> to use them in some places, (eg the coefficients table reported by
> print.summary).  I can see a couple of solutions for this problem.  A)
> Truncate labels when necessary.   B) Have 2 attributes--One short 'label'
> that has a fixed length (say 30 characters), and one long 'description' that
> can has no length limit.  C) Continue to use the variable name given in the
> call for places where length is a problem, but show a translation between
> the variable name and the label somewhere else as part of the output.
> 
> Except for the problem of the label attribute getting 'carried along' when
> it is not desirable, I think that it would be straightforward and 'backwards
> compatible' to add general support for variable labels.
> 
> I am willing to submit patches for functions that I regularly use.  Would
> others be willing to contribute?  Would the patches be accepted?
> 
> -Greg
> 
> LEGAL NOTICE
> Unless expressly stated otherwise, this message is confidential and may be privileged. It is intended for the addressee(s) only. Access to this E-mail by anyone else is unauthorized. If you are not an addressee, any disclosure or copying of the contents of this E-mail or any action taken (or not taken) in reliance on it is unauthorized and may be unlawful. If you are not an addressee, please inform the sender immediately.

-- 
Frank E Harrell Jr              Prof. of Biostatistics & Statistics
Div. of Biostatistics & Epidem. Dept. of Health Evaluation Sciences
U. Virginia School of Medicine  http://hesweb1.med.virginia.edu/biostat
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._