[R] Tabulating Baseline Characteristics on specific observations

Wed Sep 21 02:20:00 CEST 2011

You could use the na.action function on the fitted
object to see which observations were omitted.  E.g.,
let's make a data.frame that we can actually do some
regressions with and try na.action():

  > d <- data.frame(V1=11:15, V2=log(c(1,NA,NA,4,5)), V3=sqrt((-1):3), V4=sin(1:5))
  Warning message:
  In sqrt((-1):3) : NaNs produced
  > d
    V1       V2       V3         V4
  1 11 0.000000      NaN  0.8414710
  2 12       NA 0.000000  0.9092974
  3 13       NA 1.000000  0.1411200
  4 14 1.386294 1.414214 -0.7568025
  5 15 1.609438 1.732051 -0.9589243
  > fit12 <- lm(V1 ~ V2, data=d, na.action=na.omit)
  > if (length(na.action(fit12))>0) d[-na.action(fit12), ] else d
    V1       V2       V3         V4
  1 11 0.000000      NaN  0.8414710
  4 14 1.386294 1.414214 -0.7568025
  5 15 1.609438 1.732051 -0.9589243

You can also call na.action on the output of na.omit (or
na.exclude) itself, but then you have to remember which
variables were in the model.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com 

> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of justin jarvis
> Sent: Tuesday, September 20, 2011 4:38 PM
> To: David Winsemius
> Cc: r-help at r-project.org
> Subject: Re: [R] Tabulating Baseline Characteristics on specific observations
> 
> That still discards the other data columns.  For example, in the data frame
> 
> V1 V2 V3 V4
> 1  1  1  NA 1
> 2  1 NA  1  1
> 3  1 NA  1  1
> 4  1  1    1  1
> 5  1  1    1  1
> 
> Suppose I was running a regression using V1 and V2.  R will remove rows 2
> and 3 due to the "NA."  I would like a way to look at only the observations
> used for the regression, the data frame:
> 
> V1 V2 V3 V4
> 1  1  1  NA 1
> 4  1  1    1  1
> 5  1  1    1  1
> 
> If I run na.omit(subset(dataframe, select= c(V1,V2)) it returns
> 
> V1 V2
> 1  1  1
> 4  1  1
> 5  1  1
> 
> Sorry for being unclear the previous time.
> 
> Justin
> 
> On Tue, Sep 20, 2011 at 4:54 AM, David Winsemius <dwinsemius at comcast.net>wrote:
> 
> >
> > On Sep 19, 2011, at 8:49 PM, justin jarvis wrote:
> >
> >  I have a data set with many missing observations.  When I run a
> >> regression, R of course discards the observations (the whole row) that
> >> have "NA".  I want to tabulate some baseline characteristics (column
> >> means) but only for the observations that R used for the regression.
> >> I tried to recreate this data frame by using na.omit on the original
> >> data frame, but this will not work as this will discard an observation
> >> with an "NA" in any column, and not just in the covariates.
> >>
> >> In summary, I only want to remove observations that have an "NA" in
> >> the covariate columns.  Something like Stata's e(sample), as far as I
> >>
> >
> > na.omit(subset(dfrm, select= <covariate-vector> )  # or equivalent
> >
> >  can tell.
> >>
> >> Justin Jarvis
> >> PhD student, University of California, Irvine
> >>
> >> ______________________________**________________
> >> R-help at r-project.org mailing list
> >> https://stat.ethz.ch/mailman/**listinfo/r-help<https://stat.ethz.ch/mailman/listinfo/r-help>
> >> PLEASE do read the posting guide http://www.R-project.org/**
> >> posting-guide.html <http://www.R-project.org/posting-guide.html>
> >> and provide commented, minimal, self-contained, reproducible code.
> >>
> >
> > David Winsemius, MD
> > West Hartford, CT
> >
> >
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.