[R] variable labels to accompany data.frame

David Winsemius dwinsemius at comcast.net
Wed Oct 28 19:39:58 CET 2009


As does Muenchen in RforSASSPSSusers.pdf and in the book that grew out  
of that effort:

http://rforsasandspssusers.googlepages.com/RforSASSPSSusers.pdf

http://www.amazon.com/SAS-SPSS-Users-Statistics-Computing/dp/0387094172/ref=pd_bbs_sr_1?ie=UTF8&s=books&qid=1217456813&sr=8-1

http://rforsasandspssusers.com/

Also see QuickR
http://www.statmethods.net/input/variablelables.html


On Oct 28, 2009, at 2:14 PM, Ista Zahn wrote:

> Alzola and Harrell discuss some of these issues in "An introduction to
> S and the Hmisc and Design Libraries".
>
> -ista
>
> On Wed, Oct 28, 2009 at 1:27 PM, Jacob Wegelin <jacobwegelin at fastmail.fm 
> > wrote:
>>
>> Often it is useful to keep a "codebook" to document the contents of a
>> dataset. (By "dataset" I mean
>> a rectangular structure such as a dataframe.)
>>
>> The codebook has as many rows as the dataset has columns (variables,
>> fields).  The columns (fields)
>> of the codebook may include:
>>
>>        •       variable name
>>
>>        •       type (character, factor, integer, etc)
>>
>>        •       variable label (e.g., a variable called "bmi2" might  
>> be
>> labeled "BMI hand-input by
>>        clinic personnel, must be checked"
>>
>>        •       permissible values
>>
>>        •       which values indicate missing (and potentially  
>> different
>> kinds of missing)
>>
>> Some statistics software (e.g., SPSS and Stata) provides at least a  
>> subset
>> of this kind of
>> information automatically in a convenient form. For instance, in  
>> Stata one
>> can define a "label" for
>> a variable and it is thenceforth linked to the variable. In output  
>> from
>> certain modeling and
>> graphics functions, Stata by default uses the label rather than the  
>> variable
>> name.
>>
>> Furthemore: In Stata, if "myvariable" is labeled numeric (in R  
>> lingo, a
>> factor), and I type
>>
>> codebook myvariable
>>
>> then Stata tells me, among other things, the "levels" of myvariable.
>>
>> Does a tool of this sort exist in R?
>>
>> The prompt() function is related to this, but prompt(someDataFrame)  
>> creates
>> a text file on disk. The
>> text file is associated with, but not unambiguously linked to,
>> someDataFrame.
>>
>> The epicalc function codebook() provides a summary of a dataframe  
>> similar to
>> that created by
>> summary() but easier to read. But this is not a way to define and  
>> keep track
>> of labels that are
>> linked to variables.
>>
>> To link a dataframe to its codebook, one could do the following "by  
>> hand":
>> Create a list, say,
>> "somedata", where somedata$DATA is a dataframe that contains the  
>> data, and
>> somedata$VARIABLE is also
>> a dataframe, but serves as the codebook. For instance, the following
>> function creates a template
>> into which one could subsequently edit to insert variable labels  
>> and turn
>> into somedata$VARIABLE.
>>
>> fnJunk <-function( THESEDATA ) {
>> #  From a dataframe, make the start of a codebook.
>>   if(!is.data.frame(THESEDATA)) stop("!is.data.frame(THESEDATA)")
>>   data.frame(
>>      Variable=names(THESEDATA)
>>      , class=sapply(THESEDATA, class)
>>      , type=sapply(THESEDATA, typeof)
>>      , label=""
>>      , comment=""
>>      )
>> }
>>
>>
>> But the following automatic behavior would be nice:
>>
>>        •       We should be able to treat somedata exactly as we  
>> treat a
>> dataframe, so that the
>>        fact that it possesses a "codebook" is merely an added  
>> benefit, not
>> an interference with the
>>        usual tasks.
>>
>>        •       If we delete a column of somedata$DATA, the  
>> associated row of
>> somedata$VARIABLE
>>        should be automatically deleted.
>>
>>        •       If we add a column to somedata$DATA, the associated  
>> column
>> should be inserted in
>>        somedata$VARIABLE, and some of the fields automatically  
>> populated
>> such as variable name and
>>        type.  It could get fancier. For instance:
>>
>>        •       If we try to add a value to a field in somedata$DATA  
>> which is
>> not permitted by the
>>        "permissible values" listed for this field in somedata 
>> $VARIABLE, we
>> get an error.
>>
>> Has anyone already thought this through, maybe defined a class and
>> associated methods?
>>
>> Thanks
>>
>> Jacob A. Wegelin
>> Assistant Professor
>> Department of Biostatistics
>> Virginia Commonwealth University
>> 730 East Broad Street Room 3006
>> P. O. Box 980032
>> Richmond VA 23298-0032
>> U.S.A. E-mail: jwegelin at vcu.edu URL: http://www.people.vcu.edu/~jwegelin
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>
>
>
> -- 
> Ista Zahn
> Graduate student
> University of Rochester
> Department of Clinical and Social Psychology
> http://yourpsyche.org
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
Heritage Laboratories
West Hartford, CT




More information about the R-help mailing list