[R] using a noisy variable in regression (not an R question)

Paul Johnson pauljohn32 at gmail.com
Sat Mar 7 20:58:02 CET 2009


On Sat, Mar 7, 2009 at 11:49 AM, Juliet Hannah <juliet.hannah at gmail.com> wrote:
> Hi, This is not an R question, but I've seen opinions given on non R
> topics, so I wanted
> to give it a try. :)
>
> How would one treat a variable that was measured once, but is known to
> fluctuate a lot?
> For example, I want to include a hormone in my regression as an
> explanatory variable. However, this
> hormone varies in its levels throughout a day. Nevertheless, its levels differ
> substantially between individuals so that there is information there to use.
>
> One simple thing to try would be to form categories, but I assume
> there are better ways to handle this. Has anyone worked with such data, or could
> anyone suggest some keywords that may be helpful in searching for this
> topic. Thanks
> for your input.
>

>From teaching econometrics, I remember that if the "truth" is
y=b0+b1x1+noise and then you do not have a correct measure of x1, but
rather something else like ex1=x1+noise, then the regression estimate
of b1 is biased, generally attenuated.  As far as I understand it, the
technical solutions are not too encouraging You can try to get better
data or possibly to  build an instrumental variables model, where you
could have other predictors of the "true" value of x1 in a first stage
model.  I don't recall that I was able to persuade myself that
approach really solves anything, but many people recommend it. I
suppose a key question is whether you can persuade your audience that
ex1= x1+noise and whether that noise is well behaved.

As I was considering your problem, I was wondering if there might not
be a "mixed model" approach to this problem.  You hypothesize the
truth is y=b0+b1x1+noise, but you don't have x1.  So suppose you
reconsider the "truth" as a random parameter, as in y=b0+c1*ex1+noise.
ex1 is a fixed estimate of the hormone level for each observation.  c1
is a random, varying coefficient because the effect of the hormone
fluctuates in an unmeasurable way. Then you could try to estimate the
distribution of c1.

You have an interesting problem, I think.

pj
-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas




More information about the R-help mailing list