[R] Collinearity? Cannot get logisticRidge{ridge} to work

David Winsemius dwinsemius at comcast.net
Thu May 28 00:03:28 CEST 2015


On May 27, 2015, at 3:00 PM, Kengo Inagaki wrote:

> Here is the result-
> 
>> with(a,  table(Sex, Therapy1,  Outcome) )
> , , Outcome = Alive
> 
>        Therapy1
> Sex      no yes
>  female  0   4
>  male    4   5
> 
> , , Outcome = Death
> 
>        Therapy1
> Sex      no yes
>  female  6   3
>  male    3   0

So no deaths when Female had no-Therapy1 and no survivors with the opposite for those variables. Complete separation.

-- 
David.

> 
> 
> 2015-05-27 16:57 GMT-05:00 David Winsemius <dwinsemius at comcast.net>:
>> 
>> On May 27, 2015, at 2:49 PM, Kengo Inagaki wrote:
>> 
>>> Thank you very much for your rapid response. I sincerely appreciate your input.
>>> I am sorry for sending the previous email in HTML format.
>>> 
>>> with(a,  table(Sex, Therapy1) )   shows the following.
>>>         Therapy1
>>> Sex      no yes
>>> female  6   7
>>> male    7   5
>>> 
>>> and with(a,  table(Therapy1, Outcome) )
>>> elicit the following
>>> 
>>>       Outcome
>>> Sex      Alive Death
>>> female     4     9
>>> male       9     3
>>> 
>>>       Outcome
>>> Therapy1 Alive Death
>>>    no      4     9
>>>    yes     9     3
>> 
>> Then what about:
>> 
>> with(a,  table(Sex, Therapy1,  Outcome) )
>> 
>> --
>> David
>> 
>> 
>>> 
>>> As there is no zero cells, it does not seem to be complete separation.
>>> I really appreciate comments.
>>> 
>>> Kengo Inagaki
>>> Memphis, TN
>>> 
>>> 
>>> 2015-05-27 13:57 GMT-05:00 David Winsemius <dwinsemius at comcast.net>:
>>>> 
>>>> On May 27, 2015, at 10:10 AM, Kengo Inagaki wrote:
>>>> 
>>>>> I am currently working on a health care related project using R. I am
>>>>> learning R while working on data analysis.
>>>>> 
>>>>> Below is the part of the data in which i am encountering a problem.
>>>>> 
>>>>> 
>>>>> Case#    Sex         Therapy1             Therapy2             Outcome
>>>>> 
>>>>> 1              male      no
>>>>> no                           Alive
>>>>> 
>>>> 
>>>> snipped mangled data sent in HTML
>>>> 
>>>>> 
>>>>> 
>>>>> "Outcome" is the response variable and "Sex", "Therapy1", "Therapy2" are
>>>>> predictor variables.
>>>>> 
>>>>> All of the predictors are significantly associated with the outcome by
>>>>> univariate analysis.
>>>>> 
>>>>> Logistic regression runs fine with most of the predictors when "Sex" and
>>>>> "Therapy1" are not included at the same time (This is a part of table that
>>>>> I cut out from a larger table for ease of
>>>>> 
>>>>> presentation and there are more predictors that i tested).
>>>> 
>>>> Please examine the data before reaching for ridge regression:
>>>> 
>>>> What does this show: ...
>>>> 
>>>>   with(a,  table(Sex, Therapy1) )
>>>> 
>>>> I predict you will see a zero cell entry. The read about "complete separation" and the so-called "Hauck-Donner effect".
>>>> 
>>>> --
>>>> David.
>>>>> 
>>>>> However, when "Sex" and "Therapy1" are included in logistic regression
>>>>> model at the same time, standard error inflates and p value gets close to 1.
>>>>> 
>>>>> The formula used is,
>>>>> 
>>>>> 
>>>>> 
>>>>>> Model<-glm(Outcome~Sex+Therapy1,data=a,family=binomial) #I assigned a
>>>>> vector "a" to represent above table.
>>>>> 
>>>>> 
>>>>> 
>>>>> After doing some reading, I suspect this might be collinearity, as vif
>>>>> values (using "vif()" function in car package) were sky high (8,875,841 for
>>>>> both "Sex" and "Therapy1").
>>>>> 
>>>>> Learning that ridge regression may be a solution, I attempted using
>>>>> logisticRidge {ridge} using the following formula, but i get the
>>>>> accomapnying error message.
>>>>> 
>>>>> 
>>>>> 
>>>>>> logisticRidge(a$Outcome~a$Sex+a$Therapy1)
>>>>> 
>>>>> 
>>>>> 
>>>>> Error in ifelse(y, log(p), log(1 - p)) :
>>>>> 
>>>>> invalid to change the storage mode of a factor
>>>>> 
>>>>> 
>>>>> 
>>>>> At this point I do not have an idea how to solve this and would like to
>>>>> seek help.
>>>>> 
>>>>> I really really appreciate your input!!!
>>>>> 
>>>>>     [[alternative HTML version deleted]]
>>>>> 
>>>> 
>>>> 
>>>> David Winsemius
>>>> Alameda, CA, USA
>>>> 
>> 
>> David Winsemius
>> Alameda, CA, USA
>> 

David Winsemius
Alameda, CA, USA



More information about the R-help mailing list