[R] using a noisy variable in regression (not an R question)

Juliet Hannah juliet.hannah at gmail.com
Sat Mar 7 19:32:10 CET 2009


Thank you for your responses.

I should have emphasized, I do not intend to categorize -- mainly
because of all the discussions I have seen on R-help arguing against
this.

I just thought it would be problematic to include the variable by
itself. Take other variables, such as a genotype or BMI. If we measure
this variable the next day, it would be the same. However, a hormone's
level would not be the same. I thought this error must be accounted
for somehow.

Thanks again!

Regards,

Juliet

On Sat, Mar 7, 2009 at 1:21 PM, Jonathan Baron <baron at psych.upenn.edu> wrote:
> If you form categories, you add even more error, specifically, the
> variation in the distance between each number and the category
> boundary.
>
> What's wrong with just including it in the regression?
>
> Yes, the measure X1 will account for less variance than the underlying
> variable of real interest (T1, each individual's mean, perhaps), but
> X1 could still be useful in two ways.  One, it might be a significant
> predictor of the dependent variable Y despite the error.  Two, it
> might increase the sensitivity of the model to other predictors (X2,
> X3...) by accounting for what would otherwise be error.
>
> What you cannot conclude in this case (when you measure a predictor
> with error) is that the effect of (say) X2 is not accounted for by its
> correlation with T1.  Some people try to conclude this when X2 remains
> a significant predictor of Y when X1 is included in the model.  The
> trouble is that X1 is an error-prone measure of T1, so the full effect
> of T1 is not removed by inclusion of X1.
>
> Jon
>
> On 03/07/09 12:49, Juliet Hannah wrote:
>> Hi, This is not an R question, but I've seen opinions given on non R
>> topics, so I wanted
>> to give it a try. :)
>>
>> How would one treat a variable that was measured once, but is known to
>> fluctuate a lot?
>> For example, I want to include a hormone in my regression as an
>> explanatory variable. However, this
>> hormone varies in its levels throughout a day. Nevertheless, its levels differ
>> substantially between individuals so that there is information there to use.
>>
>> One simple thing to try would be to form categories, but I assume
>> there are better ways to handle this. Has anyone worked with such data, or could
>> anyone suggest some keywords that may be helpful in searching for this
>> topic. Thanks
>> for your input.
>>
>> Regards,
>>
>> Juliet
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> --
> Jonathan Baron, Professor of Psychology, University of Pennsylvania
> Home page: http://www.sas.upenn.edu/~baron
> Editor: Judgment and Decision Making (http://journal.sjdm.org)
>




More information about the R-help mailing list