[R] Prediction from a rank deficient fit may be misleading

Thu Mar 10 23:21:31 CET 2016

Here is the results of the logistic regression model.  Is it because of the
NA values?

Call:
glm(formula = TARGET_A ~ Contract + Dependents + DeviceProtection +
    gender + InternetService + MonthlyCharges + MultipleLines +
    OnlineBackup + OnlineSecurity + PaperlessBilling + Partner +
    PaymentMethod + PhoneService + SeniorCitizen + StreamingMovies +
    StreamingTV + TechSupport + tenure + TotalCharges, family =
binomial(link = "logit"),
    data = churn_training)

Deviance Residuals:
    Min       1Q   Median       3Q      Max
-1.8943  -0.6867  -0.2863   0.7378   3.4259

Coefficients: (7 not defined because of singularities)
                                       Estimate Std. Error z value Pr(>|z|)

(Intercept)                           1.0664928  1.7195494   0.620   0.5351

ContractOne year                     -0.6874005  0.1314227  -5.230 1.69e-07
***
ContractTwo year                     -1.2775385  0.2101193  -6.080 1.20e-09
***
DependentsYes                        -0.1485301  0.1095348  -1.356   0.1751

DeviceProtectionNo internet service  -1.5547306  0.9661837  -1.609   0.1076

DeviceProtectionYes                   0.0459115  0.2114253   0.217   0.8281

genderMale                           -0.0350970  0.0776896  -0.452   0.6514

InternetServiceFiber optic            1.4800374  0.9545398   1.551   0.1210

InternetServiceNo                            NA         NA      NA       NA

MonthlyCharges                       -0.0324614  0.0379646  -0.855   0.3925

MultipleLinesNo phone service         0.0808745  0.7736359   0.105   0.9167

MultipleLinesYes                      0.3990450  0.2131343   1.872   0.0612
.
OnlineBackupNo internet service              NA         NA      NA       NA

OnlineBackupYes                      -0.0328892  0.2081145  -0.158   0.8744

OnlineSecurityNo internet service            NA         NA      NA       NA

OnlineSecurityYes                    -0.2760602  0.2132917  -1.294   0.1956

PaperlessBillingYes                   0.3509944  0.0890884   3.940 8.15e-05
***
PartnerYes                            0.0306815  0.0940650   0.326   0.7443

PaymentMethodCredit card (automatic) -0.0710923  0.1377252  -0.516   0.6057

PaymentMethodElectronic check         0.3074078  0.1137939   2.701   0.0069
**
PaymentMethodMailed check            -0.0201076  0.1377539  -0.146   0.8839

PhoneServiceYes                              NA         NA      NA       NA

SeniorCitizen                         0.1856454  0.1023527   1.814   0.0697
.
StreamingMoviesNo internet service           NA         NA      NA       NA

StreamingMoviesYes                    0.5260087  0.3899615   1.349   0.1774

StreamingTVNo internet service               NA         NA      NA       NA

StreamingTVYes                        0.4781321  0.3905777   1.224   0.2209

TechSupportNo internet service               NA         NA      NA       NA

TechSupportYes                       -0.2511197  0.2181612  -1.151   0.2497

tenure                               -0.0702813  0.0077113  -9.114  < 2e-16
***
TotalCharges                          0.0004276  0.0000874   4.892 9.97e-07
***

On Thu, Mar 10, 2016 at 4:05 PM, David Winsemius <dwinsemius at comcast.net>
wrote:

>
> > On Mar 10, 2016, at 8:08 AM, Michael Artz <michaeleartz at gmail.com>
> wrote:
> >
> > HI all,
> > I have the following error -
> >> resultVector <- predict(logitregressmodel, dataset1, type='response')
> > Warning message:
> > In predict.lm(object, newdata, se.fit, scale = 1, type = ifelse(type ==
> :
> >  prediction from a rank-deficient fit may be misleading
>
> It wasn't an R error. It was an R warning. Was the `summary` output on
> logitregressmodel informative? Does the resultVector look sensible given
> its inputs?
>
>
> > I have seen on internet that there may be some collinearity in the data
> and
> > this is causing that.  How can I be sure?
>
> Do some diagnostics. After looking carefully at the output of
> summary(logitregressmodel)  and perhaps summary(dataset1) if it was the
> original input to the modeling functions, and then you could move on to
> looking at cross-correlations on things you think are continuous and
> crosstabs on factor variables and the condition number on the full data
> matrix.
>
> Lots of stuff turns up on search for "detecting collinearity condition
> number in r"
>
> >
> > Thanks
> >
> >       [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius
> Alameda, CA, USA
>
>

	[[alternative HTML version deleted]]