[R] using a noisy variable in regression (not an R question)

Stephan Kolassa Stephan.Kolassa at gmx.de
Sat Mar 7 19:59:01 CET 2009


Hi Juliet,

Juliet Hannah schrieb:
> 
> I should have emphasized, I do not intend to categorize -- mainly
> because of all the discussions I have seen on R-help arguing against
> this.

Sorry that we all jumped on this ;-)

> I just thought it would be problematic to include the variable by
> itself. Take other variables, such as a genotype or BMI. If we measure
> this variable the next day, it would be the same. However, a hormone's
> level would not be the same. I thought this error must be accounted
> for somehow.

You are quite correct that fluctuating hormone levels are a problem 
(although, strictly speaking, measuring BMI and even genotyping will not 
yield exactly the same results the next day, measurement error is always 
present). And there may be methods dealing with this, but I don't know 
of any.

If you have any idea about the variability of your hormone, you could 
always take your data, perturb the hormone levels and run the analysis 
again to get a feeling for the stability of your results. This is quite 
ad hoc, but if I were the reviewer, a perturbation analysis like this 
would greatly reassure me. However, I recently worked with hormones and 
had exactly your problem, and we couldn't find any published data on 
day-to-day variability, so this was not an option - we finally went 
ahead and simply plugged the measurements into R.

Good luck!
Stephan

> 
> Thanks again!
> 
> Regards,
> 
> Juliet
> 
> On Sat, Mar 7, 2009 at 1:21 PM, Jonathan Baron <baron at psych.upenn.edu> wrote:
>> If you form categories, you add even more error, specifically, the
>> variation in the distance between each number and the category
>> boundary.
>>
>> What's wrong with just including it in the regression?
>>
>> Yes, the measure X1 will account for less variance than the underlying
>> variable of real interest (T1, each individual's mean, perhaps), but
>> X1 could still be useful in two ways.  One, it might be a significant
>> predictor of the dependent variable Y despite the error.  Two, it
>> might increase the sensitivity of the model to other predictors (X2,
>> X3...) by accounting for what would otherwise be error.
>>
>> What you cannot conclude in this case (when you measure a predictor
>> with error) is that the effect of (say) X2 is not accounted for by its
>> correlation with T1.  Some people try to conclude this when X2 remains
>> a significant predictor of Y when X1 is included in the model.  The
>> trouble is that X1 is an error-prone measure of T1, so the full effect
>> of T1 is not removed by inclusion of X1.
>>
>> Jon
>>
>> On 03/07/09 12:49, Juliet Hannah wrote:
>>> Hi, This is not an R question, but I've seen opinions given on non R
>>> topics, so I wanted
>>> to give it a try. :)
>>>
>>> How would one treat a variable that was measured once, but is known to
>>> fluctuate a lot?
>>> For example, I want to include a hormone in my regression as an
>>> explanatory variable. However, this
>>> hormone varies in its levels throughout a day. Nevertheless, its levels differ
>>> substantially between individuals so that there is information there to use.
>>>
>>> One simple thing to try would be to form categories, but I assume
>>> there are better ways to handle this. Has anyone worked with such data, or could
>>> anyone suggest some keywords that may be helpful in searching for this
>>> topic. Thanks
>>> for your input.
>>>
>>> Regards,
>>>
>>> Juliet
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>> --
>> Jonathan Baron, Professor of Psychology, University of Pennsylvania
>> Home page: http://www.sas.upenn.edu/~baron
>> Editor: Judgment and Decision Making (http://journal.sjdm.org)
>>
>




More information about the R-help mailing list