[R] Survey Design / Rake questions

Wed Aug 20 16:12:38 CEST 2008

On Mon, Aug 18, 2008 at 6:18 PM, Farley, Robert <FarleyR at metro.net> wrote:
> My motivation is to try to correct for a "time on board" bias we see in
> our surveys.  Not surprisingly, riders who are only on board a short
> time don't attempt/finish our survey forms.  We're able to weight our
> survey to the "bus stop-on by bus run" level.

So is it the problem of catching the short rides in your sample, or
the problem of having those short rides complete the survey? If the
former, then all you have to do is to weight by inverse probability of
selection (Horvitz-Thompson estimator). This probability is probably
roughly proportional to time on bus, which in turn might be
proportional to the number of stops in their ride. You may not need
any raking for that, just do some algebra computing those
probabilities of selection.

If the latter is the problem, then it is the problem of non-response.
If you think that the only thing that matters in whether a person
chooses to respond or not is the length of the ride, then your data
are "missing at random" (MAR), one of several standard concepts in the
missing data statistics
(http://www.citeulike.org/user/ctacmo/article/553290). You can bypass
that -- in survey statistics, that will be done with weights, again.
Here, you would need to boost the weight by the inverse fraction of
those who did complete the survey.

In a more difficult situation, your response probability might depend
on other factors, say demographics of the passengers, time of the day,
etc. I would imagine you would still have MAR data, unless you have
some weird questions like "Do you carry firearms on the bus?" to which
the people who did have guns at the time of their ride would probably
decline to answer, making the data informatively missing/not missing
at random (NMAR).

-- 
Stas Kolenikov, also found at http://stas.kolenikov.name
Small print: I use this email account for mailing lists only.