[R] Regression model

srecko joksimovic sreckojoksimovic at gmail.com
Fri Nov 22 00:52:47 CET 2013


Hi,

I'm trying to fit regression model, but there is something wrong with it.
The dataset contains 85 observations for 85 students.Those observations are
counts of several actions, and dependent variable is final score. More
precisely, I have 5 IV and one DV. I'm trying to build regression model to
check whether those variables can predict the final score.

I'm attaching output of several steps, but I tried to following procedure:
- build model with only those two variables
- summary shows that non of them is significant predictor of the final
outcome.
- test for multicollinearity revealed tolerance below 0.2 (potential
problem)
- build two new models having as a predictor only one of those values
- both models show that variable used for the model is significant
predictor. Separately they are significant, together not. Probably
multicollinearity problem, but...
- as I keep adding other variables to one or the other model, Multiple
R-squared slightly increases.
- I tried to compare different models using anova, but non of them seems to
be better.

How to determine which model is better?

Thanks
-------------- next part --------------
> lm.all.1 <- lm(mark~IA+IC, data=social_presence_data)
> summary(lm.all.1)

Call:
lm(formula = mark ~ IA + IC, data = social_presence_data)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.5969 -0.2573  0.2599  0.5819  1.2955 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  2.78938    0.24599  11.339   <2e-16 ***
IA           0.02844    0.04503   0.632    0.530    
IC           0.01979    0.02601   0.761    0.449    
---
Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1

Residual standard error: 1.031 on 79 degrees of freedom
Multiple R-squared:   0.12,	Adjusted R-squared:  0.09774 
F-statistic: 5.387 on 2 and 79 DF,  p-value: 0.006407

> 1/vif(lm.all.1)
       IA        IC 
0.1719037 0.1719037 
> dwt(lm.all.1)
 lag Autocorrelation D-W Statistic p-value
   1      0.09176706      1.815883   0.372
 Alternative hypothesis: rho != 0
> lm.all.2 <- lm(mark~IA, data=social_presence_data)
> lm.all.3 <- lm(mark~IC, data=social_presence_data)
> anova(lm.all.2, lm.all.3)
Analysis of Variance Table

Model 1: mark ~ IA
Model 2: mark ~ IC
  Res.Df    RSS Df Sum of Sq F Pr(>F)
1     80 84.604                      
2     80 84.413  0   0.19141         
> anova(lm.all.1, lm.all.3)
Analysis of Variance Table

Model 1: mark ~ IA + IC
Model 2: mark ~ IC
  Res.Df    RSS Df Sum of Sq      F Pr(>F)
1     79 83.989                           
2     80 84.413 -1  -0.42402 0.3988 0.5295
> anova(lm.all.1, lm.all.2)
Analysis of Variance Table

Model 1: mark ~ IA + IC
Model 2: mark ~ IA
  Res.Df    RSS Df Sum of Sq      F Pr(>F)
1     79 83.989                           
2     80 84.604 -1  -0.61543 0.5789  0.449
> summary(lm.all.2)

Call:
lm(formula = mark ~ IA, data = social_presence_data)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.5409 -0.2539  0.2283  0.5793  1.2956 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  2.88517    0.21078  13.688  < 2e-16 ***
IA           0.05961    0.01862   3.202  0.00196 ** 
---
Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1

Residual standard error: 1.028 on 80 degrees of freedom
Multiple R-squared:  0.1136,	Adjusted R-squared:  0.1025 
F-statistic: 10.25 on 1 and 80 DF,  p-value: 0.001962

> summary(lm.all.3)

Call:
lm(formula = mark ~ IC, data = social_presence_data)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.6320 -0.2562  0.2590  0.5764  1.2585 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  2.76364    0.24168  11.435  < 2e-16 ***
IC           0.03473    0.01074   3.233  0.00178 ** 
---
Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1

Residual standard error: 1.027 on 80 degrees of freedom
Multiple R-squared:  0.1156,	Adjusted R-squared:  0.1045 
F-statistic: 10.45 on 1 and 80 DF,  p-value: 0.001779

> lm.all.3.1 <- lm(mark~IC+AU, data=social_presence_data)
> summary(lm.all.3.1)

Call:
lm(formula = mark ~ IC + AU, data = social_presence_data)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.5951 -0.2618  0.2378  0.5907  1.2619 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  2.77600    0.24499  11.331  < 2e-16 ***
IC           0.03276    0.01191   2.752  0.00735 ** 
AU           0.04994    0.12697   0.393  0.69514    
---
Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1

Residual standard error: 1.033 on 79 degrees of freedom
Multiple R-squared:  0.1173,	Adjusted R-squared:  0.09496 
F-statistic: 5.249 on 2 and 79 DF,  p-value: 0.007236


More information about the R-help mailing list