[R] Training a model using glm

Mohan Radhakrishnan radhakrishnan.mohan at gmail.com
Wed Sep 17 20:04:20 CEST 2014


Hi Dennis,

                     Why is there that warning ? I think my syntax is
right. Isn't it not? So the warning can be ignored ?

Thanks,
Mohan

On Wed, Sep 17, 2014 at 9:48 PM, Dennis Murphy <djmuser at gmail.com> wrote:

> No reproducible example (i.e., no data) supplied, but the following
> should work in general, so I'm presuming this maps to the caret
> package as well. Thoroughly untested.
>
> library(caret)    # something you failed to mention
>
> ...
> modelFit <- train(diagnosis ~ ., data = training1)    # presumably a
> logistic regression
> confusionMatrix(test1$diagnosis, predict(modelFit, newdata = test1,
> type = "response"))
>
> For GLMs, there are several types of possible predictions. The default
> is 'link', which associates with the linear predictor. caret may have
> a different syntax so you should check its help pages re the supported
> predict methods.
>
> Hint: If a function takes a data = argument, you don't need to specify
> the variables as components of the data frame - the variable names are
> sufficient. You should also do some reading to understand why the
> model formula I used is correct if you're modeling one variable as
> response and all others in the data frame as covariates.
>
> Dennis
>
> On Tue, Sep 16, 2014 at 11:15 PM, Mohan Radhakrishnan
> <radhakrishnan.mohan at gmail.com> wrote:
> > I answered this question which was part of the online course correctly by
> > executing some commands and guessing.
> >
> > But I didn't get the gist of this approach though my R code works.
> >
> > I have a training and test dataset.
> >
> >> nrow(training)
> >
> > [1] 251
> >
> >> nrow(testing)
> >
> > [1] 82
> >
> >> head(training1)
> >
> >    diagnosis    IL_11    IL_13    IL_16   IL_17E IL_1alpha      IL_3
> > IL_4
> >
> > 6   Impaired 6.103215 1.282549 2.671032 3.637051 -8.180721 -3.863233
> > 1.208960
> >
> > 10  Impaired 4.593226 1.269463 3.476091 3.637051 -7.369791 -4.017384
> > 1.808289
> >
> > 11  Impaired 6.919778 1.274133 2.154845 4.749337 -7.849364 -4.509860
> > 1.568616
> >
> > 12  Impaired 3.218759 1.286356 3.593860 3.867347 -8.047190 -3.575551
> > 1.916923
> >
> > 13  Impaired 4.102821 1.274133 2.876338 5.731246 -7.849364 -4.509860
> > 1.808289
> >
> > 16  Impaired 4.360856 1.278484 2.776394 5.170380 -7.662778 -4.017384
> > 1.547563
> >
> >          IL_5       IL_6 IL_6_Receptor     IL_7     IL_8
> >
> > 6  -0.4004776  0.1856864   -0.51727788 2.776394 1.708270
> >
> > 10  0.1823216 -1.5342758    0.09668586 2.154845 1.701858
> >
> > 11  0.1823216 -1.0965412    0.35404039 2.924466 1.719944
> >
> > 12  0.3364722 -0.3987186    0.09668586 2.924466 1.675557
> >
> > 13  0.0000000  0.4223589   -0.53219115 1.564217 1.691393
> >
> > 16  0.2623643  0.4223589    0.18739989 1.269636 1.705116
> >
> > The testing dataset is similar with 13 columns. Number of rows vary.
> >
> >
> > training1 <- training[,grepl("^IL|^diagnosis",names(training))]
> >
> > test1 <- testing[,grepl("^IL|^diagnosis",names(testing))]
> >
> > modelFit <- train(training1$diagnosis ~ training1$IL_11 +
> training1$IL_13 +
> > training1$IL_16 + training1$IL_17E + training1$IL_1alpha +
> training1$IL_3 +
> > training1$IL_4 + training1$IL_5 + training1$IL_6 +
> training1$IL_6_Receptor
> > + training1$IL_7 + training1$IL_8,method="glm",data=training1)
> >
> > confusionMatrix(test1$diagnosis,predict(modelFit, test1))
> >
> > I get this error when I run the above command to get the confusion
> matrix.
> >
> > *'newdata' had 82 rows but variables found have 251 rows '*
> >
> > I thought this was simple. I train a model using the training dataset and
> > predict using the test dataset and get the accuracy.
> >
> > Am I missing the obvious here ?
> >
> > Thanks,
> >
> > Mohan
> >
> >         [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list