[R] svymeans question

Thu Aug 28 20:06:23 CEST 2008

Other people have explained that the issue is missing data.  I just wanted 
to note that the reason for using only the complete cases on all variables 
is that svymeans() computes the covariance matrix of all the means, and 
this can't really be done sensibly when the means are based on different 
subsets.

 	-thomas

On Tue, 26 Aug 2008, Doran, Harold wrote:

> I have the following code which produces the output below it
>
> clus1 <- svydesign(ids = ~schid, data = lower_dat)
> items <-  as.formula(paste(" ~ ", paste(lset, collapse= "+")))
> rr1 <- svymean(items, clus1, deff='replace', na.rm=TRUE)
>
>> rr1
>            mean       SE   DEff
> W525209 0.719748 0.015606 2.4932
> W525223 0.508228 0.027570 6.2802
> W525035 0.827202 0.014060 2.8561
> W525131 0.805421 0.015425 3.1350
> W525033 0.242982 0.020074 4.5239
> W525163 0.904647 0.013905 4.6289
> W525165 0.439981 0.020029 3.3620
> W525167 0.148112 0.013047 2.7860
> W525177 0.865924 0.014977 3.9898
> W525179 0.409003 0.020956 3.7515
> W525181 0.634076 0.022076 4.3372
> W525183 0.242498 0.019073 4.0894
> W525401 0.262343 0.021830 3.4354
> W525059 0.854792 0.016551 4.5576
> W525251 0.691191 0.025010 6.0512
> W525083 0.433204 0.017310 2.5200
> W525289 0.634560 0.012762 1.4504
> W524763 0.791868 0.014478 2.6265
> W524765 0.223621 0.019627 4.5818
> W524951 0.242982 0.016796 3.1669
> W524769 0.820910 0.016786 3.9579
> W524771 0.872701 0.015853 4.6712
> W524839 0.518877 0.026433 5.7794
> W525374 1.209584 0.043065 5.1572
> W524885 0.585673 0.027780 6.5674
> W525377 1.100678 0.050093 5.8851
> W524787 0.839303 0.012994 2.5852
> W524789 0.339787 0.019230 3.4041
> W524791 0.847047 0.012885 2.6461
> W524825 0.500968 0.021988 3.9935
> W524795 0.868345 0.014951 4.0377
> W524895 0.864472 0.013872 3.3917
> W524897 0.804937 0.020070 5.2977
> W524967 0.475799 0.032137 8.5511
> W525009 0.681994 0.018670 3.3188
>
> However, when I do the following:
>
> svymean(~W524787, clus1, deff='replace', na.rm=TRUE)
>            mean       SE   DEff
> W524787 0.855547 0.011365 4.1158
>
> Compare this to the value in the row 9 up from the bottom to see it is
> different.
>
> Computing the mean of the item by itself with svymeans agrees with the
> sample mean
>
>> mean(lower_dat$W524787, na.rm=T)
> [1] 0.8555471
>
> Now, I know that there is a covariance between the variables, but I was
> under the impression that the sample mean was still of pragmatic
> utility, but to account for sample design only the standard error is
> affected.
>
> In the work I am doing, it is important for the means of the items from
> svymeans to be the same as the sample mean when it is computed by
> itself. It's a bit of a story as to why, and I can provide that info if
> relevant.
>
> I don't see an argument in svydesign or in svymean that would allow for
> me to treat the variables as being independent. But, maybe I am missing
> something else and would welcome any reactions.
>
> Harold
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Thomas Lumley			Assoc. Professor, Biostatistics
tlumley at u.washington.edu	University of Washington, Seattle