[R] Quantile regression with complex survey data

Thomas Lumley tlumley at u.washington.edu
Thu Aug 21 22:49:34 CEST 2008



You can get point estimates by supplying the sampling weights as weights 
to the quantile regression functions in Roger Koenker's quantreg package. 
This is useful for smoothing (with the rqss() function; it is not clear 
how useful it is for straight line regression.

You should get valid interval estimates from BRR or bootstrap replicate 
weights if you have sufficient sample size[*].  If I recall correctly, 
NHANES has two PSUs per stratum, so BRR replicates are possible.  Use 
as.svrepdesign() to create the BRR replicates and then withReplicates() to 
run the regression and get the standard errors.

You will not get correct interval estimates with jackknife replicates or 
by any Taylor-series based approach.

As a additional note, Yiling's two copies of the message to the list 
within half an hour (following one to me less than 24 hours earlier) 
suggest an unrealistic expectation of response times.

 	-thomas


[*] this isn't explicitly in the survey literature, but quantile 
regression is a Hadamard-differentiable functional of the empirical 
process, which should give it consistency, asymptotic Normality, and 
bootstrappability under various standard sets of asymptotics.


On Wed, 20 Aug 2008, Stas Kolenikov wrote:

> On Wed, Aug 20, 2008 at 8:12 AM, Cheng, Yiling (CDC/CCHP/NCCDPHP)
> <ycc1 at cdc.gov> wrote:
>> I am working on the NHANES survey data, and want to apply quantile
>> regression on these complex survey data. Does anyone know how to do
>> this?
>
> There are no references in technical literature (thinking, Annals,
> JASA, JRSS B, Survey Methodology). Absolutely none. Zero. You might be
> able to apply the procedure mechanically and then adjust the standard
> errors, but God only knows what the population equivalent is of
> whatever that model estimates. If there is a population analogue at
> all.
>
> In general, a quantile regression is a heavily model based concept:
> for each value of the explanatory variables, there is a well defined
> distribution of the response, and quantile regression puts additional
> structure on it -- linearity of quantiles wrt to some explanatory
> variables. That does not mesh well with the design paradigm according
> to which the survey estimation is usually conducted. With the latter,
> the finite population and characteristics of every unit are assumed
> fixed, and randomness comes only from the sampling procedure. Within
> that paradigm, you can define the marginal distribution of the
> response (or any other) variable, but the conditional distributions
> may simply be unavailable because there are no units in the population
> satisfying the conditions.
>
> -- 
> Stas Kolenikov, also found at http://stas.kolenikov.name
> Small print: I use this email account for mailing lists only.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Thomas Lumley			Assoc. Professor, Biostatistics
tlumley at u.washington.edu	University of Washington, Seattle



More information about the R-help mailing list