[R] Archer-Lemeshow Goodness of Fit Test for Survey Data with Log. Regression

Anthony Damico ajdamico at gmail.com
Fri Nov 18 11:15:23 CET 2016


hi, my code does not subset the survey design on the line that creates the
svrepdesign().  subsetting in order to create a variable while your data is
still a data.frame is probably okay, so long as you expect the observations
outside of the subset to be NAs like they are in this case.
nrow(elsq1ch_brr)==nrow(newdesign)

On Fri, Nov 18, 2016 at 5:06 AM, Courtney Benjamin <cbenjami at btboces.org>
wrote:

> ​Thank you, Anthony.  Your approach does work; however, I am concerned to
> some degree about subsetting the data prior to creating the new svrepdesign
> as I know that it is not recommended to subset the data prior to creating
> the svrepdesign object.  I am not sure if this is a significant concern in
> this context as the model was fitted with the original svrepdesign that was
> created prior to subsetting any data and the new svrepdesign is being used
> to run the diagnostic for the model.  Any thoughts on that issue?
>
> Also, from my understanding of the outcome of the diagnostic, low p values
> indicate a poor model fit.
>
> Sincerely,
>
> Courtney
>
>
> Courtney Benjamin
>
> Broome-Tioga BOCES
>
> Automotive Technology II Teacher
>
> Located at Gault Toyota
>
> Doctoral Candidate-Educational Theory & Practice
>
> State University of New York at Binghamton
>
> cbenjami at btboces.org
>
> 607-763-8633
> ------------------------------
> *From:* Anthony Damico <ajdamico at gmail.com>
> *Sent:* Thursday, November 17, 2016 4:28 AM
> *To:* Courtney Benjamin
> *Cc:* r-help at r-project.org
> *Subject:* Re: [R] Archer-Lemeshow Goodness of Fit Test for Survey Data
> with Log. Regression
>
> great minimal reproducible example, thanks.  does something like this work?
>
>
>
> #Log. Reg. model-all curric. concentrations including F1RTRCC as a
> predictor
> allCC <- svyglm(formula=F3ATTAINB~F1PARED+BYINCOME+F1RACE+F1SEX+
> F1RGPP2+F1HIMATH+F1RTRCC,family="binomial",design=
> elsq1ch_brr,subset=BYSCTRL==1&G10COHRT==1,na.action=na.exclude)
> summary(allCC)
>
> r <- residuals(allCC, type="response")
> f<-fitted(allCC)
> your_g<- cut(f, c(-Inf, quantile(f,  (1:9)/10, Inf)))
>
> elsq1ch[ elsq1ch$BYSCTRL==1&elsq1ch$G10COHRT==1 , 'your_g' ] <- your_g
> elsq1ch[ elsq1ch$BYSCTRL==1&elsq1ch$G10COHRT==1 , 'r' ] <- r
> newdesign<-svrepdesign(variables = elsq1ch, repweights =
> elsq1ch[,18:217], weights = elsq1ch[,17], combined.weights = TRUE, type =
> "BRR")
>
> decilemodel<- svyglm(r~your_g, design=newdesign,subset=
> BYSCTRL==1&G10COHRT==1)
> regTermTest(decilemodel, ~your_g)
>
>
>
>
> On Wed, Nov 16, 2016 at 10:15 PM, Courtney Benjamin <cbenjami at btboces.org>
> wrote:
>
>> ?Hello R Experts,
>>
>> I am trying to implement the Archer-Lemeshow GOF Test for survey data on
>> a logistic regression model using the survey package based upon an R Help
>> Archive post that I found where Dr. Thomas Lumley advised how to do it:
>> http://r.789695.n4.nabble.com/Goodness-of-t-tests-for-Comple
>> x-Survey-Logistic-Regression-td4668233.html
>>
>> Everything is going well until I get to the point where I have to add the
>> objects 'r' and 'g' as variables to the data frame by either using the
>> transform function or the update function to update the svrepdesign
>> object.  The log. regression model involved uses a subset of data and some
>> of the values in the data frame are NA, so that is affecting my ability to
>> add 'r' and 'g' as variables; I am getting an error because I only have
>> 8397 rows for the new variables and 16197 in the data frame and svrepdesign
>> object.  I am not sure how to overcome this error.
>>
>> The following is a MRE:
>>
>> ##Archer Lemeshow Goodness of Fit Test for Complex Survey Data with
>> Logistic Regression
>>
>> library(RCurl)
>> library(survey)
>>
>> data <- getURL("https://raw.githubusercontent.com/cbenjamin1821/
>> careertech-ed/master/elsq1adj.csv")
>> elsq1ch <- read.csv(text = data)
>>
>> #Specifying the svyrepdesign object which applies the BRR weights
>> elsq1ch_brr<-svrepdesign(variables = elsq1ch[,1:16], repweights =
>> elsq1ch[,18:217], weights = elsq1ch[,17], combined.weights = TRUE, type =
>> "BRR")
>> elsq1ch_brr
>>
>> ##Resetting baseline levels for predictors
>> elsq1ch_brr <- update( elsq1ch_brr , F1HIMATH = relevel(F1HIMATH,"PreAlg
>> or Less") )
>> elsq1ch_brr <- update( elsq1ch_brr , BYINCOME = relevel(BYINCOME,"0-25K")
>> )
>> elsq1ch_brr <- update( elsq1ch_brr , F1RACE = relevel(F1RACE,"White") )
>> elsq1ch_brr <- update( elsq1ch_brr , F1SEX = relevel(F1SEX,"Male") )
>> elsq1ch_brr <- update( elsq1ch_brr , F1RTRCC =
>> relevel(F1RTRCC,"Academic") )
>>
>> #Log. Reg. model-all curric. concentrations including F1RTRCC as a
>> predictor
>> allCC <- svyglm(formula=F3ATTAINB~F1PARED+BYINCOME+F1RACE+F1SEX+F1RGP
>> P2+F1HIMATH+F1RTRCC,family="binomial",design=elsq1ch_brr,
>> subset=BYSCTRL==1&G10COHRT==1,na.action=na.omit)
>> summary(allCC)
>>
>> #Recommendations from Lumley (from R Help Archive) on implementing the
>> Archer Lemeshow GOF test
>> r <- residuals(allCC, type="response")
>> f<-fitted(allCC)
>> g<- cut(f, c(-Inf, quantile(f,  (1:9)/10, Inf)))
>>
>> # now create a new design object with r and g added as variables
>> #This is the area where I am having problems as my model involves a
>> subset and some values are NA as well
>> #I am also not sure if I am naming/specifying the new variables of r and
>> g properly
>> transform(elsq1ch,r=r,g=g)
>> elsq1ch_brr <- update(elsq1ch_brr,tag=g,tag=r)
>> #then:
>> decilemodel<- svyglm(r~g, design=newdesign)
>> regTermTest(decilemodel, ~g)
>> #is the F-adjusted mean residual test from the Archer Lemeshow paper
>>
>> Thank you,
>> Courtney
>>
>> ?
>>
>> Courtney Benjamin
>>
>> Broome-Tioga BOCES
>>
>> Automotive Technology II Teacher
>>
>> Located at Gault Toyota
>>
>> Doctoral Candidate-Educational Theory & Practice
>>
>> State University of New York at Binghamton
>>
>> cbenjami at btboces.org<mailto:cbenjami at btboces.org>
>>
>> 607-763-8633
>>
>>         [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posti
>> ng-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list