[R] Problems with weight

Milan Bouchet-Valat nalimilan at club.fr
Wed Nov 28 19:26:29 CET 2012


Le mercredi 28 novembre 2012 à 14:20 -0300, Pablo Menese a écrit :
> Dear Milan... are you serious? 
> Did you read this?
No, I had not read this message when I wrote the mail because you sent
two completely different messages in two different threads at about the
same time. As you can see, I was replying to the other message, which
did only mention glm().

> I have this problem.
>
> test <- svydesign(id=~1,weights=~peso)
>
> logit <- svyglm(bach ~ job2 + mujer + egp4 + programa + delay + mdeo +
> str + evprivate, family=binomial,design=test)
>
> then appear:
>
> Error in svyglm.survey.design(bach ~ job2 + mujer + egp4 + programa +
>  : 
>   all variables must be in design= argument
> 
> 
> I don't know what this mean...
> Please help.
Have you read ?svydesign? It has a "variables" argument that you can use
to specify the variables you need to include in the design object. The
documentation says:
variables: Formula or data frame specifying the variables measured in
          the survey. If ‘NULL’, the ‘data’ argument is used.

So if you want to include all variables from your original data set,
pass it as the "data" argument, and that's all.


But first, stop using attach(), it creates confusion and is probably the
reason why you did not think of passing your data.frame object to
svydesign().


Regards

> Quotes from a week ago...
> I colud not perform anything using svyglm... I wish... but... I don't
> know why...
> 
> 
> On Tue, Nov 27, 2012 at 6:54 PM, Milan Bouchet-Valat
> <nalimilan at club.fr> wrote:
>         Le mardi 27 novembre 2012 à 18:33 -0300, Pablo Menese a
>         écrit :
>         > I can't ... I don't know why but I can't
>         >
>         > When I use it:
>         >
>         > logit <- glm(bach ~ egp4 + programa, weight=wst7,
>         > family=quasibinomial(link"logit"))
>         
>         You were advised to use svyglm(), not glm(). It's usually
>         considered
>         polite to read carefully the anwsers you get to your
>         questions...
>         
>         
>         Regards
>         
>         > I reach the same betas that in STATA, but the hypothesis
>         test, the t value,
>         > and the std. error is different.
>         >
>         > I think that the solution can't be so far from this...
>         >
>         >
>         > On Fri, Nov 23, 2012 at 9:49 PM, Anthony Damico
>         <ajdamico at gmail.com> wrote:
>         >
>         > > from your stata output, it looks like you need to use the
>         survey package
>         > > in R
>         > >
>         > > for step-by-step instructions about how to do this (and
>         comparisons to
>         > > stata), see
>         > >
>         > >
>         http://journal.r-project.org/archive/2009-2/RJournal_2009-2_Damico.pdf
>         > >
>         > > once you're ready to run the regression, use svyglm()
>         instead of glm() and
>         > > drop the weights argument (since it will already be part
>         of the survey
>         > > design)   :)
>         > >
>         > >
>         > >
>         > > On Fri, Nov 23, 2012 at 3:13 PM, Pablo Menese
>         <pmenese at gmail.com> wrote:
>         > >
>         > >> Until a weeks ago I used stata for everything.
>         > >> Now I'm learning R and trying to move. But, in this stage
>         I'm testing R
>         > >> trying to do the same things than I used to do in stata
>         whit the same
>         > >> outputs.
>         > >> I have a problem with the logit, applying weights.
>         > >>
>         > >> in stata I have this output
>         > >> . svy: logit bach job2 mujer i.egp4 programa delay mdeo
>         i.str evprivate
>         > >> (running logit on estimation sample)
>         > >>
>         > >> Survey: Logistic regression
>         > >>
>         > >> Number of strata   =         1                  Number of
>         obs      =
>         > >> 248
>         > >> Number of PSUs     =       248
>          Population size    =
>         > >> 5290.1639
>         > >> Design df          =       247
>         > >> F(  11,    237)    =      4.39
>         > >> Prob > F           =    0.0000
>         > >>
>         > >>
>         > >> Linearized
>         > >> bach       Coef.   Std. Err.      t    P>t     [95% Conf.
>         Interval]
>         > >>
>         > >> job2   -.4437446   .4385934    -1.01   0.313
>          -1.307605    .4201154
>         > >> mujer    1.070595   .4169919     2.57   0.011
>           .2492812    1.891908
>         > >>
>         > >> egp4
>         > >> 2    -.4839342    .539808    -0.90   0.371    -1.547148
>            .5792796
>         > >> 3    -1.288947   .5347344    -2.41   0.017    -2.342168
>         -.2357263
>         > >> 4    -.8569793   .5106425    -1.68   0.095    -1.862748
>            .1487898
>         > >>
>         > >> programa    .9694352   .5677642     1.71   0.089
>          -.1488415    2.087712
>         > >> delay   -1.552582   .5714967    -2.72   0.007
>          -2.678211    -.426954
>         > >> mdeo   -.7938904   .3727571    -2.13   0.034    -1.528078
>         -.0597025
>         > >>
>         > >> str
>         > >> 2    -1.122691   .5731879    -1.96   0.051     -2.25165
>            .0062682
>         > >> 3    -2.056682   .6350485    -3.24   0.001    -3.307483
>         -.8058812
>         > >>
>         > >> evprivate   -1.962431   .5674143    -3.46   0.001
>          -3.080018   -.8448431
>         > >> _cons    2.308699   .7274924     3.17   0.002
>           .8758187    3.741578
>         > >>
>         > >>
>         > >> the best that i get in R was:
>         > >>
>         > >> glm(formula = bach ~ job2 + mujer + egp4 + programa +
>         delay +
>         > >>     mdeo + str + evprivate, family = quasibinomial(link =
>         "logit"),
>         > >>     weights = wst7)
>         > >>
>         > >> Deviance Residuals:
>         > >>      Min        1Q    Median        3Q       Max
>         > >> -12.5951   -3.9034   -0.9412    3.8268   11.2750
>         > >>
>         > >> Coefficients:
>         > >>                            Estimate Std. Error t value
>         Pr(>|t|)
>         > >> (Intercept)                  2.3087     0.7173   3.218
>          0.00147 **
>         > >> job2                        -0.4437     0.4355  -1.019
>          0.30926
>         > >> mujer                        1.0706     0.3558   3.009
>          0.00290 **
>         > >> egp4intermediate (iii, iv)  -0.4839     0.4946  -0.978
>          0.32890
>         > >> egp4skilled manual workers  -1.2889     0.5268  -2.447
>          0.01514 *
>         > >> egp4working class           -0.8570     0.4625  -1.853
>          0.06514 .
>         > >> programa                     0.9694     0.4951   1.958
>          0.05141 .
>         > >> delay                       -1.5526     0.4878  -3.183
>          0.00166 **
>         > >> mdeo                        -0.7939     0.4207  -1.887
>          0.06037 .
>         > >> strest. ii                  -1.1227     0.4809  -2.334
>          0.02042 *
>         > >> strestr. iii                -2.0567     0.5134  -4.006
>         8.28e-05 ***
>         > >> evprivate                   -1.9624     0.6490  -3.024
>          0.00277 **
>         > >> ---
>         > >> Signif. codes:  0 *** 0.001 ** 0.01 * 0.05 . 0.1   1
>         > >>
>         > >> (Dispersion parameter for quasibinomial family taken to
>         be 23.14436)
>         > >>
>         > >>     Null deviance: 7318.5  on 246  degrees of freedom
>         > >> Residual deviance: 5692.8  on 235  degrees of freedom
>         > >>   (103 observations deleted due to missingness)
>         > >> AIC: NA
>         > >>
>         > >> Number of Fisher Scoring iterations: 6
>         > >>
>         > >> Warning message:
>         > >> In summary.glm(logit) :
>         > >>   observations with zero weight not used for calculating
>         dispersion
>         > >>
>         > >> this has the same betas but the hypothesis test has
>         differents values...
>         > >>
>         > >>
>         > >> HELP!!!!
>         > >>
>         > >>         [[alternative HTML version deleted]]
>         > >>
>         > >>
>         > >> ______________________________________________
>         > >> R-help at r-project.org mailing list
>         > >> https://stat.ethz.ch/mailman/listinfo/r-help
>         > >> PLEASE do read the posting guide
>         > >> http://www.R-project.org/posting-guide.html
>         > >> and provide commented, minimal, self-contained,
>         reproducible code.
>         > >>
>         > >>
>         > >
>         >
>         >       [[alternative HTML version deleted]]
>         >
>         > ______________________________________________
>         > R-help at r-project.org mailing list
>         > https://stat.ethz.ch/mailman/listinfo/r-help
>         > PLEASE do read the posting guide
>         http://www.R-project.org/posting-guide.html
>         > and provide commented, minimal, self-contained, reproducible
>         code.
>         
>         
> 
>




More information about the R-help mailing list