[R] Best method to add unit information to dataframe ?

Marc Schwartz marc_schwartz at me.com
Mon Oct 3 17:08:42 CEST 2011


On Oct 3, 2011, at 9:35 AM, bruno Piguet wrote:

> Dear all,
> 
>  I'd like to have a dataframe store information about the units of
> the data it contains.
> 
>  You'll find below a minimal exemple of the way I do, so far. I add a
> "units" attribute to the dataframe. But  I dont' like the long syntax
> needed to access to the unit of a given variable (namely, something
> like :
>   var_unit <- attr(my_frame, "units")[[match(var_name, attr(my_frame,
> "names"))]]
> 
>  Can anybody point me to a better solution ?
> 
> Thanks in advance,
> 
> Bruno.
> 
> 
> # Dataframe creation
> x <- c(1:10)
> y <- c(11:20)
> z <- c(101:110)
> my_frame <- data.frame(x, y, z)
> attr(my_frame, "units") <- c("x_unit", "y_unit")
> 
> #
> # later on, using dataframe
> for (var_name in c("x", "y")) {
>   idx <- match(var_name, attr(my_frame, "names"))
>   var_unit <- attr(my_frame, "units")[[idx]]
>   print (paste("max ", var_name, ": ", max(my_frame[[var_name]]), var_unit))
> }

The problem is that there are operations on data frames (e.g. subset()) that will end up stripping your attributes. 

> str(my_frame)
'data.frame':	10 obs. of  3 variables:
 $ x: int  1 2 3 4 5 6 7 8 9 10
 $ y: int  11 12 13 14 15 16 17 18 19 20
 $ z: int  101 102 103 104 105 106 107 108 109 110
 - attr(*, "units")= chr  "x_unit" "y_unit"

newDF <- subset(my_frame, x <= 5)

> str(newDF)
'data.frame':	5 obs. of  3 variables:
 $ x: int  1 2 3 4 5
 $ y: int  11 12 13 14 15
 $ z: int  101 102 103 104 105


You might want to look at either ?comment or the ?label function in Frank's Hmisc package on CRAN, either to use or for example code on how he handles this.

HTH,

Marc Schwartz



More information about the R-help mailing list