[R] Automated Fixed Order Stepwise Regression Function

Frank Harrell f.harrell at vanderbilt.edu
Thu Apr 7 22:43:51 CEST 2011


This is a statistically invalid procedure.  I would not spend much time on
it.  It is a good idea to first bootstrap an idea or do regular Monte Carlo
simulation to see if the statistical properties are OK (confidence interval
coverage, bias, type I error, etc.).  You will find they are not in this
case.

Frank


Tyler Rinker wrote:
> 
> Greetings,
>  
> I am interested in creating a stepwise fixed order regression function. 
> There's a function for this already called add1( ).  The F statistics are
> calculated using type 2 anova (the SS and the F changes don't match
> SPSS's).  You can see my use of this at the very end of the email.
>  
> What I want: a function to make an anova table with f changes and delt
> R^2.  
>  
> I ran into 10 snags to making this a fully automated function using the
> full linear model (order matters here).  Each snag is marked with a
> Comment #.  Some snags are repeated because I couldn't do the first time
> and certainly couldn't do it after that. Help with any/all snags would be
> very appreciated. 
>  
> I'm a 2 1/2 month [R] user who's reading everything online (incl. manuals)
> & ordering every book I can (looking at Dalgaard, Crawly's and Teetor's
> very helpful books right now).  Loops and their usage is a foreign thing
> to me, despite studying them, and unfortunately I think that my function
> may call for them.  I also realize that beyond 10 predictors may make this
> function way to bulky.
>  
> I'm running the latest version of R (2.12.2)on a windows 7 machine.
> 
> DATASET
>  
> mtcars
> full.model<-lm(mpg~cyl+disp+hp+drat, data=mtcars)
>  
> CODE
> 
> stepFO<-function(model)
> {
> m<-data.frame(model.frame(model))
> num.of.var<-length(colnames(m))
> mod1<-lm(m[,1]~m[,2])
> mod2<-lm(m[,1]~m[,2]+m[,3])
> mod3<-lm(m[,1]~m[,2]+m[,3]+m[,4])
> mod4<-lm(m[,1]~m[,2]+m[,3]+m[,4]+m[,5])
> #Comment 1--I don't know how to automated this process(above) of adding 
> #...additional variables.  Probably a loop is needed but I don't
> understand 
> #...how to apply it here.  Maybe update.model [1:num.ofvar]?
> a1<-anova(mod1)
> a2<-anova(mod2)
> a3<-anova(mod3)
> a4<-anova(mod4)
> #Comment 2--SAME AS COMMENT 1 except applied to the anova tables.  How do
> I make
> #...[R] add a5, a6...an   as necessary?
> 
> rb<-rbind(a1,a2,a3,a4)
> #Comment 3--again I can't automate this to make the  addition of a's
> automated
> 
> anova1<-rbind(rb[1,],rb[4,],rb[8,],rb[13,],rb[14,])
> #Comment 4--the rb's follow a pattern of 1+3+4+5...+n variables
> #then I row bind these starting with 1 and rowbind one more after the last 
> #...rb to include the bottom piece of the anova table that tells 
> #...about residuals (how do I aoutomate this?)
> 
> anova<-anova1[,1:num.of.var]
> anova.table<-data.frame(anova)
> #Comment 5--Something that bugs me here is that I have to turn it into a
> dataframe to 
> #...add the totals row and the delta R^2 (tried playing w/ tkinsert to no
> avail)
> #...I miss the stuff that's at the bottom of the anova table (sig values)
> #Comment 6--I'd love to turn the place value to round to 2 after the
> decimal.
> #...I've worked this many ways including changing my options but this does 
> #...not seem to apply to a data frame
> 
> Total<-c(sum(anova[,1]),sum(anova[,2])," ", " ", " ")
> anova.table<-rbind(anova.table,Total)
> R1<-summary(mod1)[[8]][[1]]
> R2<-summary(mod2)[[8]][[1]]
> R3<-summary(mod3)[[8]][[1]]
> R4<-summary(mod4)[[8]][[1]]
> #Comment 7--SAME AS COMMENT 2.  How do I make
> #...[R] add R5, R6...Rn   as necessary?
> 
> deltaR.1<-R1
> deltaR.2<-R2-R1
> deltaR.3<-R3-R2
> deltaR.4<-R4-R3
> #Comment 8--SAME AS COMMENT 7.  How would I aoutomate this process?
> 
> Delta.R.Squared<-c(deltaR.1,deltaR.2,deltaR.3,deltaR.4," ","")
> #Comment 9--I need a way to add as many deltaR's as 
> #...necessary(n of R = n of predictors)
> 
> anova.table<-cbind(anova.table, Delta.R.Squared)
> colnames(anova.table)<-c("df","Sum Sq","Mean Sq","F-change",
> "P-value","Delta.R.Squared")
> rownames(anova.table)<-c("X1", "X2 elminating for X1", 
> "X3 eliminating for X2 & X3", "X4 eliminating for X1,X2, &
> X3","Residuals",
>  "Total")
> anova.table
> }
> #Comment 10--Again I would need [R] to automate the list for row names as
> we 
> #...add more predictors.
> #See the final product below I'm after (with two places after the decimal)
>> anova.table
>                                 df           Sum Sq          Mean Sq        
> F-change            P-value     Delta.R.Squared
> X1                               1 817.712952354614 817.712952354614
> 79.5610275293349 6.112687142581e-10   0.726180005093805
> X2 elminating for X1             1 37.5939529324851 37.5939529324851 
> 4.0268283172755 0.0541857201845285  0.0333857704630917
> X3 eliminating for X2 & X3       1 9.37092926438942 9.37092926438942
> 1.00388976918267  0.324951851250774 0.00832196853596723
> X4 eliminating for X1,Xx2, & X3  1 16.4674349222492 16.4674349222492
> 1.81550535203668  0.189048514740917  0.0146241073243205
> Residuals                       27 244.901918026262 9.07044140838007                                                
> Total                           31     1126.0471875  
>  
>  
> USING THE ADD1() FUNCTION>  NOT WHAT I WANT> 
>  
> mod1<-lm(mpg~cyl, data=mtcars)
> add1(mod1,~cyl+disp+hp+drat, data=mtcars, test="F")
> 
> Model:
> mpg ~ cyl
>        Df Sum of Sq    RSS    AIC F value   Pr(F)  
>               308.33 76.494                  
> disp    1    37.594 270.74 74.334  4.0268 0.05419 .
> hp      1    16.360 291.98 76.750  1.6249 0.21253  
> drat    1    15.841 292.49 76.807  1.5706 0.22012  
> ---
> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 
> 
> 
>  		 	   		  
> 	[[alternative HTML version deleted]]
> 
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 


-----
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context: http://r.789695.n4.nabble.com/Automated-Fixed-Order-Stepwise-Regression-Function-tp3433343p3434522.html
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list