[R] Prediction from a rank deficient fit may be misleading
    Michael Artz 
    michaeleartz at gmail.com
       
    Thu Mar 10 23:21:31 CET 2016
    
    
  
Here is the results of the logistic regression model.  Is it because of the
NA values?
Call:
glm(formula = TARGET_A ~ Contract + Dependents + DeviceProtection +
    gender + InternetService + MonthlyCharges + MultipleLines +
    OnlineBackup + OnlineSecurity + PaperlessBilling + Partner +
    PaymentMethod + PhoneService + SeniorCitizen + StreamingMovies +
    StreamingTV + TechSupport + tenure + TotalCharges, family =
binomial(link = "logit"),
    data = churn_training)
Deviance Residuals:
    Min       1Q   Median       3Q      Max
-1.8943  -0.6867  -0.2863   0.7378   3.4259
Coefficients: (7 not defined because of singularities)
                                       Estimate Std. Error z value Pr(>|z|)
(Intercept)                           1.0664928  1.7195494   0.620   0.5351
ContractOne year                     -0.6874005  0.1314227  -5.230 1.69e-07
***
ContractTwo year                     -1.2775385  0.2101193  -6.080 1.20e-09
***
DependentsYes                        -0.1485301  0.1095348  -1.356   0.1751
DeviceProtectionNo internet service  -1.5547306  0.9661837  -1.609   0.1076
DeviceProtectionYes                   0.0459115  0.2114253   0.217   0.8281
genderMale                           -0.0350970  0.0776896  -0.452   0.6514
InternetServiceFiber optic            1.4800374  0.9545398   1.551   0.1210
InternetServiceNo                            NA         NA      NA       NA
MonthlyCharges                       -0.0324614  0.0379646  -0.855   0.3925
MultipleLinesNo phone service         0.0808745  0.7736359   0.105   0.9167
MultipleLinesYes                      0.3990450  0.2131343   1.872   0.0612
.
OnlineBackupNo internet service              NA         NA      NA       NA
OnlineBackupYes                      -0.0328892  0.2081145  -0.158   0.8744
OnlineSecurityNo internet service            NA         NA      NA       NA
OnlineSecurityYes                    -0.2760602  0.2132917  -1.294   0.1956
PaperlessBillingYes                   0.3509944  0.0890884   3.940 8.15e-05
***
PartnerYes                            0.0306815  0.0940650   0.326   0.7443
PaymentMethodCredit card (automatic) -0.0710923  0.1377252  -0.516   0.6057
PaymentMethodElectronic check         0.3074078  0.1137939   2.701   0.0069
**
PaymentMethodMailed check            -0.0201076  0.1377539  -0.146   0.8839
PhoneServiceYes                              NA         NA      NA       NA
SeniorCitizen                         0.1856454  0.1023527   1.814   0.0697
.
StreamingMoviesNo internet service           NA         NA      NA       NA
StreamingMoviesYes                    0.5260087  0.3899615   1.349   0.1774
StreamingTVNo internet service               NA         NA      NA       NA
StreamingTVYes                        0.4781321  0.3905777   1.224   0.2209
TechSupportNo internet service               NA         NA      NA       NA
TechSupportYes                       -0.2511197  0.2181612  -1.151   0.2497
tenure                               -0.0702813  0.0077113  -9.114  < 2e-16
***
TotalCharges                          0.0004276  0.0000874   4.892 9.97e-07
***
On Thu, Mar 10, 2016 at 4:05 PM, David Winsemius <dwinsemius at comcast.net>
wrote:
>
> > On Mar 10, 2016, at 8:08 AM, Michael Artz <michaeleartz at gmail.com>
> wrote:
> >
> > HI all,
> > I have the following error -
> >> resultVector <- predict(logitregressmodel, dataset1, type='response')
> > Warning message:
> > In predict.lm(object, newdata, se.fit, scale = 1, type = ifelse(type ==
> :
> >  prediction from a rank-deficient fit may be misleading
>
> It wasn't an R error. It was an R warning. Was the `summary` output on
> logitregressmodel informative? Does the resultVector look sensible given
> its inputs?
>
>
> > I have seen on internet that there may be some collinearity in the data
> and
> > this is causing that.  How can I be sure?
>
> Do some diagnostics. After looking carefully at the output of
> summary(logitregressmodel)  and perhaps summary(dataset1) if it was the
> original input to the modeling functions, and then you could move on to
> looking at cross-correlations on things you think are continuous and
> crosstabs on factor variables and the condition number on the full data
> matrix.
>
> Lots of stuff turns up on search for "detecting collinearity condition
> number in r"
>
> >
> > Thanks
> >
> >       [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius
> Alameda, CA, USA
>
>
	[[alternative HTML version deleted]]
    
    
More information about the R-help
mailing list