[R] Error of Stepwise Regression with number of rows in use has changed: remove missing values?

Greg Snow Greg.Snow at imail.org
Fri Feb 19 21:57:29 CET 2010


Have you considered the implications of that solution?

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.snow at imail.org
801.408.8111


> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> project.org] On Behalf Of Kum-Hoe Hwang
> Sent: Wednesday, February 17, 2010 1:41 AM
> To: r-help at r-project.org
> Subject: Re: [R] Error of Stepwise Regression with number of rows in
> use has changed: remove missing values?
> 
> I thank those who helped to solve a error in stepwise regression with
> missing values.
> 
> 
> Kum
> 
> *
> *
> 
> A good solution that I have tried was Andreas's advice.
> 
> =====================================================================
> 
> Try
> 
> data<-na.omit(original database) before you run step() or stepAIC()
> 
> On Tue, Feb 16, 2010 at 8:09 PM, Peter Ehlers <ehlers at ucalgary.ca>
> wrote:
> 
> > On 2010-02-16 1:24, Kum-Hoe Hwang wrote:
> >
> >> Howdy, R Grues
> >>
> >> I have enjoyed R, but I cannot solve one problem easily. Please help
> my
> >> problem.
> >> When I tried the R script, I got the following Error. This error
> >> results from input data file exported through a Excel spreadsheet
> >> software.
> >>
> >>  Error in step(lm(pop.rate ~ as.numeric(year) + as.factor(policy) +
> >> as.numeric(nation.grant) +  :
> >>   number of rows in use has changed: remove missing values?
> >>
> >> Could you direct me to solve the Error?
> >> Thanks in advance,
> >>
> >
> > This is a common situation when you use step() on data where
> > the predictors have missing values.
> >
> > A case (row) is included in the model only if all the
> > predictors for that model are non-missing for the case.
> >
> > As you vary which predictors are to be in the model, the
> > included cases will vary, resulting in models based on
> > different data. (Think of your cases as subjects; you want
> > all your models to be based on the same set of subjects.)
> >
> > Finally: (Re-)read the help page and note the 'warning'.
> >
> >  -Peter Ehlers
> >
> >
> >
> >>
> >>  ############### outputs from R console ###############
> >>> pop<- step(
> >>>
> >> +             lm(pop.rate ~ as.numeric(year) + as.factor(policy) +
> >> as.numeric(nation.grant)
> >> +                + as.numeric(do.grant) + as.numeric(city.grant) +
> >> as.numeric(DMZ.dist) + as.numeric(Seoul.dist), data=borderI.data,
> >> na.action = na.omit)
> >> +             )
> >> Start:  AIC=494.27
> >> pop.rate ~ as.numeric(year) + as.factor(policy) +
> as.numeric(nation.grant)
> >> +
> >>     as.numeric(do.grant) + as.numeric(city.grant) +
> as.numeric(DMZ.dist) +
> >>     as.numeric(Seoul.dist)
> >>                            Df Sum of Sq    RSS    AIC
> >> - as.numeric(do.grant)      1      0.71 6622.9 492.28
> >> - as.factor(policy)         1      1.21 6623.4 492.29
> >> - as.numeric(DMZ.dist)      1      1.91 6624.1 492.30
> >> - as.numeric(city.grant)    1      5.07 6627.3 492.36
> >> - as.numeric(nation.grant)  1     11.51 6633.7 492.47
> >> - as.numeric(year)          1     29.58 6651.8 492.80
> >> <none>                                    6622.2 494.27
> >> - as.numeric(Seoul.dist)    1    673.22 7295.4 503.79
> >> Step:  AIC=492.28
> >> pop.rate ~ as.numeric(year) + as.factor(policy) +
> as.numeric(nation.grant)
> >> +
> >>     as.numeric(city.grant) + as.numeric(DMZ.dist) +
> as.numeric(Seoul.dist)
> >>                            Df Sum of Sq    RSS    AIC
> >> - as.factor(policy)         1      1.99 6624.9 490.32
> >> - as.numeric(DMZ.dist)      1      2.09 6625.0 490.32
> >> - as.numeric(city.grant)    1      7.18 6630.1 490.41
> >> - as.numeric(nation.grant)  1     20.08 6643.0 490.64
> >> - as.numeric(year)          1     28.89 6651.8 490.80
> >> <none>                                    6622.9 492.28
> >> - as.numeric(Seoul.dist)    1    697.46 7320.4 502.20
> >> Step:  AIC=490.32
> >> pop.rate ~ as.numeric(year) + as.numeric(nation.grant) +
> >> as.numeric(city.grant) +
> >>     as.numeric(DMZ.dist) + as.numeric(Seoul.dist)
> >>                            Df Sum of Sq    RSS    AIC
> >> - as.numeric(DMZ.dist)      1      2.08 6627.0 488.35
> >> - as.numeric(city.grant)    1     10.65 6635.6 488.51
> >> - as.numeric(nation.grant)  1     31.30 6656.2 488.88
> >> - as.numeric(year)          1     31.44 6656.4 488.88
> >> <none>                                    6624.9 490.32
> >> - as.numeric(Seoul.dist)    1    732.88 7357.8 500.80
> >> Step:  AIC=488.35
> >> pop.rate ~ as.numeric(year) + as.numeric(nation.grant) +
> >> as.numeric(city.grant) +
> >>     as.numeric(Seoul.dist)
> >>                            Df Sum of Sq    RSS    AIC
> >> - as.numeric(city.grant)    1      9.86 6636.9 486.53
> >> - as.numeric(year)          1     31.42 6658.4 486.92
> >> - as.numeric(nation.grant)  1     33.33 6660.3 486.95
> >> <none>                                    6627.0 488.35
> >> - as.numeric(Seoul.dist)    1    754.40 7381.4 499.18
> >>
> >> Error in step(lm(pop.rate ~ as.numeric(year) + as.factor(policy) +
> >> as.numeric(nation.grant) +  :
> >>
> >> --------------------------------------------------------------------
> -----------------------------------------------------------------------
> >>   number of rows in use has changed: remove missing values?
> >>
> >> --------------------------------------------------------------------
> ----------------------
> >>
> >>
> >>
> >>
> >> --
> >> Kum-Hoe Hwang, Ph.D.
> >>
> >> Phone : 82-31-250-3516
> >> Email : phdhwang at gmail.com
> >>
> >>
> > --
> > Peter Ehlers
> > University of Calgary
> >
> 
> 
> 
> --
> Kum-Hoe Hwang, Ph.D.
> 
> Phone : 82-31-250-3516
> Email : phdhwang at gmail.com
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list