[R] rpart and survey weights

Erofili Grapsa erwfili at gmail.com
Wed Jan 27 12:56:28 CET 2016


Dear R users

I have a question regarding rpart and survey weights. In the introduction
to rpart document it says "Weights are not yet supported, and will be
ignored if present", however they are somehow used as the results are
different with and without weights. Can weights now be used and if yes,
what kind of weights? Can survey weights be used safely? These are my
results with weights:
Classification tree:
rpart(formula = cl2m ~ age + day + Employed + media + geo + soclass +
    persinc + hhsizeM + nfadult + nmadult + childshM, data = tum,
    weights = tum$pweight, method = "class", control = rpart.control(xval =
10,
        minbucket = 2, cp = 0))

Variables actually used in tree construction:
[1] age      day      Employed geo      hhsizeM  media    soclass

Root node error: 11950440/16768 = 712.69

n= 16768

         CP nsplit rel error  xerror       xstd
1 0.1980770      0   1.00000 1.00000 0.00016997
2 0.1405072      1   0.80192 0.80192 0.00017852
3 0.0300841      2   0.66142 0.66142 0.00017714
4 0.0053155      3   0.63133 0.63133 0.00017604
5 0.0025728      4   0.62602 0.62819 0.00017591
6 0.0020625      6   0.62087 0.62326 0.00017570
7 0.0020000      9   0.61468 0.62233 0.00017566

and without weights:

Classification tree:
rpart(formula = cl2m ~ age + day + Employed + media + geo + soclass +
    persinc + hhsizeM + nfadult + nmadult + childshM, data = tum,
    method = "class", control = rpart.control(xval = 10, minbucket = 2,
        cp = 0))

Variables actually used in tree construction:
[1] age      day      Employed media

Root node error: 10954/16768 = 0.65327

n= 16768

        CP nsplit rel error  xerror      xstd
1 0.192624      0   1.00000 1.00000 0.0056261
2 0.157020      1   0.80738 0.80738 0.0059018
3 0.030856      2   0.65036 0.65218 0.0058457
4 0.012872      3   0.61950 0.62050 0.0058038
5 0.002000      4   0.60663 0.60809 0.0057845

Does the root node error make sense when using survey weights? How can I
interpret it?

Regards
Erofili

	[[alternative HTML version deleted]]



More information about the R-help mailing list