[R] How to pass na.rm=T to a user defined function

David Winsemius dwinsemius at comcast.net
Fri Jul 29 08:29:49 CEST 2016


> On Jul 28, 2016, at 7:37 PM, Jun Shen <jun.shen.ut at gmail.com> wrote:
> 
> Because in reality the NA may appear in one variable but not others. For
> example for ID=1, CL may be NA but not for others, For ID=2, V1 may be NA
> etc. To keep all the IDs and all the variables in one data frame, it's
> inevitable to see some NA

That doesn't seem to acknowledge Newmiller's advice. In particular this would have seemed to an obvious response to that suggestion:

do.stats <- function(data, stats.func, summary.var)
          as.data.frame(signif(sapply(stats.func,function(func)
mapply( func,  na.omit( data[summary.var]) )), 3))


And please also heed the advice in the Posting Guide to use plain text.

-- 
David.



> 
> On Thu, Jul 28, 2016 at 10:22 PM, Jeff Newmiller <jdnewmil at dcn.davis.ca.us>
> wrote:
> 
>> Why not remove it yourself before passing it to those functions?
>> --
>> Sent from my phone. Please excuse my brevity.
>> 
>> On July 28, 2016 5:51:47 PM PDT, Jun Shen <jun.shen.ut at gmail.com> wrote:
>>> Dear list,
>>> 
>>> I write a small function to calculate multiple stats on multiple
>>> variables
>>> and export in a format exactly the way I want. Everything seems fine
>>> until
>>> NA appears in the data.
>>> 
>>> Here is my function:
>>> 
>>> do.stats <- function(data, stats.func, summary.var)
>>>           as.data.frame(signif(sapply(stats.func,function(func)
>>> mapply(func,data[summary.var])),3))
>>> 
>>> A test dataset:
>>> test <-
>> 
>>> data.frame(ID=1:100,CL=rnorm(100),V1=rnorm(100),V2=rnorm(100),ALPHA=rnorm(100))
>>> 
>>> a command like the following
>>> do.stats(test, stats.func=c('mean','sd','median','min','max'),
>>> summary.var=c('CL','V1', 'V2','ALPHA'))
>>> 
>>> gives me
>>> 
>>>        mean    sd  median   min  max
>>> CL     0.1030 0.917  0.0363 -2.32 2.47
>>> V1    -0.0545 1.070 -0.2120 -2.21 2.70
>>> V2     0.0600 1.000  0.0621 -2.80 2.62
>>> ALPHA -0.0113 0.919  0.0284 -2.35 2.31
>>> 
>>> 
>>> However if I have a NA in the data
>>> test$CL[1] <- NA
>>> 
>>> The same command run gives me
>>>        mean    sd  median   min  max
>>> CL        * NA    NA      NA    NA   NA*
>>> V1    -0.0545 1.070 -0.2120 -2.21 2.70
>>> V2     0.0600 1.000  0.0621 -2.80 2.62
>>> ALPHA -0.0113 0.919  0.0284 -2.35 2.31
>>> 
>>> I know this is because those functions (mean, sd etc.) all have
>>> na.rm=F by default. How can I
>>> 
>>> pass na.rm=T to all these functions without manually redefining those
>>> stats functions
>>> 
>>> Appreciate any comment.
>>> 
>>> Thanks for your help.
>>> 
>>> 
>>> Jun
>>> 
>>>      [[alternative HTML version deleted]]
>>> 
>>> ______________________________________________
>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>> 
>> 
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA



More information about the R-help mailing list