[R] how to ignore rows missing arguments of a function when creating a function?

Joris Meys jorismeys at gmail.com
Wed Jun 9 11:33:11 CEST 2010


It's difficult to help you if we don't know what the data looks like.
Two more tips :
- look at ?with instead of attach(). The latter causes a lot of
trouble further on.
- use require() instead of library(). See also the help files on this.

I also wonder what you're doing with the na.rm=TRUE. I don't see how
it affects the function, as it is used nowhere. Does it really remove
NA?

This said: estfun takes the estimation function from your model
object. Your model is essentially fit on all observations that are not
NA, whereas I presume your cluster contains all observations,
including NA. Now I don't know what kind of model fm represents, but
normally there is information in the model object about the removed
observations. I illustrate this using lm :

> x <- rnorm(100)
> y <- c(rnorm(90),NA,rnorm(9))
> test <- lm(x~y)
> str(test)
List of 13
 ... (tons of information)
 $ model        :'data.frame':  99 obs. of  2 variables:
  ... (more tons of information)
  ..- attr(*, "na.action")=Class 'omit'  Named int 91
  .. .. ..- attr(*, "names")= chr "91"
 - attr(*, "class")= chr "lm"

Now we know that we can do :
> not <-attr(test$model,"na.action")
> y[-not]

So try this
### NOT TESTED
cl <- function(dat, na.rm = TRUE, fm, cluster){
require(sandwich)
require(lmtest)

not <- attr(fm$model,"na.action")
cluster <- cluster[-not]

with( dat ,{
    M <- length(unique(cluster))
    N <- length(cluster)
    K <- fm$rank

    dfc <- (M/(M-1))*((N-1)/(N-K))

    uj <- data.frame(apply(estfun(fm),2, function(x) data.frame(tapply(x,
    cluster, sum)) ) );

    vcovCL <- dfc*sandwich(fm, meat=crossprod(uj)/N)

    coeftest(fm, vcovCL)
    } )
}

Cheers
Joris

On Wed, Jun 9, 2010 at 12:06 AM, edmund jones <edmund.j.jones at gmail.com> wrote:
> Hi,
>
> I am relatively new to R; when creating functions, I run into problems with
> missing values. I would like my functions to ignore rows with missing values
> for arguments of my function) in the analysis (as for example is the case in
> STATA). Note that I don't want my function to drop rows if there are missing
> arguments elsewhere in a row, ie for variables that are not arguments of my
> function.
>
> As an example: here is a clustering function I wrote:
>
>
> cl <- function(dat, na.rm = TRUE, fm, cluster){
>
> attach( dat , warn.conflicts = F)
>
> library(sandwich)
>
> library(lmtest)
>
> M <- length(unique(cluster))
>
> N <- length(cluster)
>
> K <- fm$rank
>
> dfc <- (M/(M-1))*((N-1)/(N-K))
>
> uj <- data.frame(apply(estfun(fm),2, function(x) data.frame(tapply(x,
> cluster, sum)) ) );
>
> vcovCL <- dfc*sandwich(fm, meat=crossprod(uj)/N)
>
> coeftest(fm, vcovCL)
>
> }
>
>
> When I run my function, I get the message:
>
>
> Error in tapply(x, cluster, sum) : arguments must have same length
>
>
> If I specify instead attach(na.omit(dat), warn.conflicts = F)  and don't
> have the "na.rm=TRUE" argument, then my function runs; but only for the rows
> where there are no missing values AT ALL; however, I don't care if there are
> missing values for variables on which I am not applying my function.
>
>
> For example, I have information on children's size; if I want regress scores
> on age and parents' education, clustering on class, I would like missing
> values in size not to interfere (ie if I have scores, age, parents'
> education, and class, but not size, I don't want to drop this observation).
>
>
> I tried to look at the code of "lm" to see how the na.action part works, but
> I couldn't figure it out... This is exactly how I would like to deal with
> missing values.
>
>
> I tried to write
>
> cl <- function(dat, fm, cluster, na.action){
>
> attach( dat , warn.conflicts = F)
>
> library(sandwich)
>
> library(lmtest)
>
>  M <- length(unique(cluster))
>
>  N <- length(cluster)
>
>  K <- fm$rank
>
>  dfc <- (M/(M-1))*((N-1)/(N-K))
>
> uj <- data.frame(apply(estfun(fm),2, function(x) data.frame(tapply(x,
> cluster, sum)) ) );
>
> vcovCL <- dfc*sandwich(fm, meat=crossprod(uj)/N)
>
>  coeftest(fm, vcovCL)
>
> }
>
>  attr(cl,"na.action") <- na.exclude
>
>
> but it still didn't work...
>
>
> Any ideas of how to deal with this issue?
>
> Thank you for your answers!
>
> Edmund
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
Joris.Meys at Ugent.be
-------------------------------
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php



More information about the R-help mailing list