[R] logistic regression

Mikhail Spivakov ensdev.box at gmail.com
Wed Jun 25 01:30:54 CEST 2008


Hi everyone,

I'm sorry if this turns out to be more a statistical question than one
specifically about R - but would greatly appreciate your advice anyway.

I've been using a logistic regression model to look at the relationship
between a binary outcome (say, the odds of picking n white balls from a bag
containing m balls in total) and a variety of other binary parameters:

_________________________________________________________________

> a.fit <- glm (data=a, formula=cbind(WHITE,ALL-WHITE)~A*B*C*D,
> family=binomial(link="logit"))
> summary(a.fit)

glm(formula = cbind(SUCCESS, ALL - SUCCESS) ~ A * B * C * D family =
binomial(link = "logit"), data = a)

Deviance Residuals: 
 [1]  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0

Coefficients:
	Estimate	Std.	Error	z value	Pr(>|z|)
(Intercept)	-0.69751	0.02697	-25.861	<2.00E-16	***
A	-0.02911	0.05451	-0.534	0.593335	
B	0.39842	0.06871	5.798	6.70E-09	***
C	0.829	0.06745	12.29	<2.00E-16	***
D	0.05928	0.11133	0.532	0.594401	
A:B	-0.44053	0.13807	-3.191	0.001419	**
A:C	-0.49596	0.13664	-3.63	0.000284	***
B:C	-0.62194	0.14164	-4.391	1.13E-05	***
A:D	-0.4031	0.2279	-1.769	0.076938	.
B:D	-0.60238	0.25978	-2.319	0.020407	*
C:D	-0.58467	0.27195	-2.15	0.031558	*
A:B:C	0.5006	0.27364	1.829	0.067335	.
A:B:D	0.51868	0.4683	1.108	0.268049	
A:C:D	0.32882	0.51226	0.642	0.520943	
B:C:D	0.56301	0.49903	1.128	0.259231	
A:B:C:D	-0.32115	0.87969	-0.365	0.715059	

---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 2.2185e+02  on 15  degrees of freedom
Residual deviance: 1.0385e-12  on  0  degrees of freedom
AIC: 124.50

Number of Fisher Scoring iterations: 3

_________________________________________________________________

This seems to produce sensible results given the actual data.
However, there are actually three types of balls in the experiment and I
need to model the relationship between the odds of picking each of the type
and the parameters A,B,C,D. So what I do now is split the initial data table
and just run glm three times:

>all

[fictional data]

TYPE WHITE ALL A B C D 
a	100	400	1	0	0	0
b	200	600	1	0	0	0
c	10	300	1	0	0	0
....
a	30	100	1	1	1	1
b	50	200	1	1	1	1
c	20	120	1	1	1	1

> a<-all[all$type=="a",]
> b<-all[all$type=="b",]
> c<-all[all$type=="c",]
> a.fit <- glm (data=a, formula=cbind(WHITE,ALL-WHITE)~A*B*C*D,
> family=binomial(link="logit"))
> b.fit <- glm (data=b, formula=cbind(WHITE,ALL-WHITE)~A*B*C*D,
> family=binomial(link="logit"))
> c.fit <- glm (data=c, formula=cbind(WHITE,ALL-WHITE)~A*B*C*D,
> family=binomial(link="logit"))

But it seems to me that I should be able to incorporate TYPE into the model. 

Something like:

>summary(glm(data=example2,family=binomial(link="logit"),formula=cbind(WHITE,ALL-WHITE)~TYPE*A*B*C*D))

[please see the output below]

However, when I do this, it does not seem to give an expected result.
Is this not the right way to do it? 
Or this is actually less powerful than running the three models separately?  

Will greatly appreciate your advice!

Many thanks
Mikhail

-----

	Estimate	Std.	Error	z value	Pr(>|z|)
(Intercept)	-8.90E-01	1.91E-02	-46.553	<2.00E-16	***
TYPE1	1.93E-01	2.47E-02	7.822	5.18E-15	***
TYPE2	1.19E+00	2.42E-02	49.108	<2.00E-16	***
A	1.89E-01	3.34E-02	5.665	1.47E-08	***
B	1.60E-01	4.41E-02	3.627	0.000286	***
C	2.24E-02	4.91E-02	0.455	0.64906	
D	1.96E-01	6.58E-02	2.982	0.002868	**
TYPE1:A	-2.19E-01	4.59E-02	-4.759	1.95E-06	***
TYPE2:A	-9.08E-01	4.50E-02	-20.178	<2.00E-16	***
TYPE1:C	2.39E-01	5.93E-02	4.022	5.77E-05	***
TYPE2:B	-1.82E+00	6.46E-02	-28.178	<2.00E-16	***
A:B	-2.26E-01	8.52E-02	-2.649	0.008066	**
TYPE1:C	8.07E-01	6.27E-02	12.87	<2.00E-16	***
TYPE2:C	-2.51E+00	7.83E-02	-32.039	<2.00E-16	***
A:C	-1.70E-01	9.51E-02	-1.783	0.074512	.
B:C	-3.01E-01	1.12E-01	-2.698	0.006985	**
TYPE1:D	-1.37E-01	9.20E-02	-1.489	0.136548	
TYPE2:D	-1.13E+00	9.19E-02	-12.329	<2.00E-16	***
A:D	-2.11E-01	1.27E-01	-1.655	0.097953	.
B:D	-2.15E-01	1.55E-01	-1.387	0.165472	
C:D	-5.51E-01	2.76E-01	-1.997	0.045829	*
TYPE1:A:B	-2.15E-01	1.17E-01	-1.84	0.065714	.


TYPE2:A:B	7.21E-01	1.28E-01	5.635	1.75E-08	***
TYPE1:A:C	-3.26E-01	1.24E-01	-2.643	0.008221	**
TYPE2:A:C	9.70E-01	1.53E-01	6.36	2.02E-10	***
TYPE1:B:C	-3.21E-01	1.38E-01	-2.321	0.020313	*
TYPE2:B:C	1.35E+00	1.89E-01	7.133	9.85E-13	***
A:B:C	1.80E-01	2.11E-01	0.852	0.394425	
TYPE1:A:D	-1.92E-01	1.83E-01	-1.05	0.293758	
TYPE2:A:D	6.76E-01	1.80E-01	3.75	0.000177	***
TYPE1:B:D	-3.87E-01	2.16E-01	-1.796	0.072443	.
TYPE2:B:D	1.09E+00	2.30E-01	4.709	2.49E-06	***
A:B:D	1.92E-01	2.73E-01	0.702	0.482512	
TYPE1:C:D	-3.33E-02	3.18E-01	-0.105	0.916465	
TYPE2:C:D	1.20E-01	5.05E-01	0.238	0.811914	
A:C:D	-7.37E+00	1.74E+04	-4.23E-04	0.999663	
B:C:D	3.14E-01	4.92E-01	0.638	0.523254	
TYPE1:A:B:C	3.21E-01	2.64E-01	1.218	0.223336	
TYPE2:A:B:C	-8.43E-01	3.59E-01	-2.351	0.018747	*
TYPE1:A:B:D	3.27E-01	3.84E-01	0.85	0.3952	
TYPE2:A:B:D	-6.59E-01	4.08E-01	-1.617	0.105883	
TYPE1:A:C:D	7.69E+00	1.74E+04	4.42E-04	0.999648	
TYPE2:A:C:D	-1.60E+01	3.48E+04	-4.58E-04	0.999634	
TYPE1:B:C:D	2.49E-01	5.70E-01	0.437	0.662288
TYPE2:B:C:D	-7.08E-01	8.97E-01	-0.789	0.430007
A:B:C:D	9.08E-03	2.47E+04	3.67E-07	1
TYPE1:A:B:C:D	-3.30E-01	2.47E+04	-1.34E-05	0.999989
TYPE2:A:B:C:D	1.10E+00	4.94E+04	2.22E-05	0.999982
-- 
View this message in context: http://www.nabble.com/logistic-regression-tp18102137p18102137.html
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list