[R] Numeric class and sasxport.get

Sebastien Bihorel Sebastien.Bihorel at cognigencorp.com
Wed Feb 4 20:36:58 CET 2009


I also realized the flaw after testing the script on various datasets...

Following up on your last note:
1- Is that the reason why the class of integer and regular numeric 
variable is solely "labelled" following sasxport.get?
2- Can class be 'soft' for other 'kind' of variables?
3- Would you anticipate the following wrapper function to generate 
incompatibilities with other R functions?


SASxpt.get <- function(file, force.single = TRUE,
                  method=c('read.xport','dataload','csv'), formats=NULL, 
allow=NULL,
                  out=NULL, keep=NULL, drop=NULL, as.is=0.5, FUN=NULL) {
 
  foo <- sasxport.get(file=file, force.single=force.single, method=method,
                      formats=formats, allow=allow, out=out, keep=keep,
                      drop=drop, as.is=as.is, FUN=FUN)
 
  # For each variable of class "labelled" (and only "labelled"), add the 
native class as a second class argument

  sglClassVarInd <- which(lapply(lapply(unclass(foo),class),length)==1)
 
  for (i in 1:length(sglClassVarInd)){
    x <- foo[,sglClassVarInd[i]]   
    if (class(x)=="labelled") class(foo[,sglClassVarInd[i]]) <- 
c(class(x), class(unclass(x)))
  }
  return(foo)
}


*Sebastien Bihorel, PharmD, PhD*
PKPD Scientist
Cognigen Corp
Email: sebastien.bihorel at cognigencorp.com 
<mailto:sebastien.bihorel at cognigencorp.com>
Phone: (716) 633-3463 ext. 323


Frank E Harrell Jr wrote:
> Sebastien Bihorel wrote:
>> Thanks a lot Frank,
>>
>> One last question, though. I was tempted to remove all attributes of 
>> my variables after the sasxport.get call using
>> foo <- sasxport.get(...)
>> foo <- as.data.frame(lapply(unclass(foo),as.vector))
>> Since I never worked with the objects of class 'labeled', I was 
>> wondering what I will loose by removing this attribute.
>
> Not a good idea, for many reasons including dates and other types.
>
> And the labelled type is need if you subset the data, in order to keep 
> the labels.
>
> Note that your original issue is related to "class" being "soft" for 
> integers and regular numerics:
>
>  x <- 1:3
> > attributes(x)
> NULL
> > class(x)
> [1] "integer"
> > x <- runif(3)
> > class(x)
> [1] "numeric"
> > attributes(x)
> NULL
>
> Frank
>
>>
>> *Sebastien Bihorel, PharmD, PhD*
>> PKPD Scientist
>> Cognigen Corp
>> Email: sebastien.bihorel at cognigencorp.com 
>> <mailto:sebastien.bihorel at cognigencorp.com>
>> Phone: (716) 633-3463 ext. 323
>>
>>
>> Frank E Harrell Jr wrote:
>>> Sebastien.Bihorel at cognigencorp.com wrote:
>>>> The problem is actually not related to a broken command but a 
>>>> attempt of
>>>> operational qualification of R. A few years ago, my company 
>>>> developed a
>>>> set of scripts for the 'operational qualification' of Splus. We are
>>>> switching to R so I am currently trying to port the scripts to R.
>>>> All Splus scripts imported SAS data using the importData function, 
>>>> which I
>>>> substituted by sasxport.get. One particular script returns the 
>>>> class of
>>>> each variable of the imported data frame; the output must match the
>>>> expected values: numeric, factor, integer, etc... The R 
>>>> 'translation' with
>>>> sasxport.get is thus problematic.
>>>> If there is no easy tweak of the function, we will probably have to 
>>>> remove
>>>> this script from our list of 'qualification' scripts.
>>>>
>>>> Although it would be nice
>>>
>>> Then my advice is to write your own wrapper function for 
>>> sasxport.get that takes its output, looks for labelled variables, 
>>> and adds a new class of your choosing depending on properties of the 
>>> variable, making sure that you write methods needed for that class 
>>> (if any).  Then test your new function, not sasxport.get explicitly.
>>>
>>> Frank
>>>
>>>>
>>>>> Sebastien Bihorel wrote:
>>>>>> Frank,
>>>>>>
>>>>>> It is a non existing issue for me if the variables of class 
>>>>>> "labelled"
>>>>>> (and only "labelled") can only be numerical variables (integer or
>>>>>> numeric).
>>>>>>
>>>>>> Sebastien
>>>>> 'labelled' can apply to any type of vector.  I'm not clear on the
>>>>> problem this causes you.  Please provide a command that is broken by
>>>>> this behavior.
>>>>>
>>>>> Frank
>>>>>
>>>>>> Frank E Harrell Jr wrote:
>>>>>>> Sebastien Bihorel wrote:
>>>>>>>> Dear R-users,
>>>>>>>>
>>>>>>>> The sasxport.get function (from the Hmisc package) automatically
>>>>>>>> defines the class of imported variables. I have noticed that the
>>>>>>>> class of theoretically numeric variables is simply "labelled",
>>>>>>>> although character variables might end up been defined as 
>>>>>>>> "labelled"
>>>>>>>> "Date" or "labelled" "factor".
>>>>>>>> Is there a way to tell sasxport.get to define numeric variable as
>>>>>>>> "labelled" "integer" or "labelled" "numeric"?
>>>>>>> Sebastien,
>>>>>>>
>>>>>>> If that would fix a problem you're having we could look into it.
>>>>>>> Otherwise I'd tend to leave well enough alone.
>>>>>>>
>>>>>>> Frank
>>>>>>>
>>>>>>>> Thank you
>>>>>>>>
>>>>>>>> Sebastien
>>>>>>>>
>>>>>>>> ______________________________________________
>>>>>>>> R-help at r-project.org mailing list
>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>>>> PLEASE do read the posting guide
>>>>>>>> http://www.R-project.org/posting-guide.html
>>>>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>>>>
>>>>>>>
>>>>>
>>>>> -- 
>>>>> Frank E Harrell Jr   Professor and Chair           School of Medicine
>>>>>                       Department of Biostatistics   Vanderbilt 
>>>>> University
>>>>>
>>>>
>>>>
>>>
>>>
>>
>
>




More information about the R-help mailing list