[R] Help understanding why glm and lrm.fit runs with my data, but lrm does not

Frank Harrell f.harrell at vanderbilt.edu
Thu Sep 14 18:21:52 CEST 2017


Fixed 'maxiter' in the help file.  Thanks.

Please give the original source of that dataset.

That dataset is a tiny sample of GUSTO-I and not large enough to fit this
model very reliably.

A nomogram using the full dataset (not publicly available to my knowledge)
is already available in http://biostat.mc.vanderbilt.edu/tmp/bbr.pdf

Use lrm, not lrm.fit for this.  Adding maxit=20 will probably make it work
on the small dataset but still not clear on why you are using this dataset.

Frank


------------------------------
Frank E Harrell Jr      Professor      School of Medicine

Department of *Biostatistics*      *Vanderbilt University*

On Thu, Sep 14, 2017 at 10:48 AM, David Winsemius <dwinsemius at comcast.net>
wrote:

>
> > On Sep 14, 2017, at 12:30 AM, Bonnett, Laura <
> L.J.Bonnett at liverpool.ac.uk> wrote:
> >
> > Dear all,
> >
> > I am using the publically available GustoW dataset.  The exact version I
> am using is available here: https://na01.safelinks.
> protection.outlook.com/?url=https%3A%2F%2Fdrive.google.com%2Fopen%3Fid%
> 3D0B4oZ2TQA0PAoUm85UzBFNjZ0Ulk&data=02%7C01%7Cf.harrell%40vanderbilt.edu%
> 7Cadb58b13c3994f89209708d4fb8807f0%7Cba5a7f39e3be4ab3b45067fa80fa
> ecad%7C0%7C0%7C636410009046132507&sdata=UZgX3%2Ba%
> 2FU2Eeh8ybHMI6JnF0Npd2XJPXAzlmtEhDgOY%3D&reserved=0
> >
> > I would like to produce a nomogram for 5 covariates - AGE, HYP, KILLIP,
> HRT and ANT.  I have successfully fitted a logistic regression model using
> the "glm" function as shown below.
> >
> > library(rms)
> > gusto <- spss.get("GustoW.sav")
> > fit <- glm(DAY30~AGE+HYP+factor(KILLIP)+HRT+ANT,family=
> binomial(link="logit"),data=gusto,x=TRUE,y=TRUE)
> >
> > However, my review of the literature and other websites suggest I need
> to use "lrm" for the purposes of producing a nomogram.  When I run the
> command using "lrm" (see below) I get an error message saying:
> > Error in lrm(DAY30 ~ AGE + HYP + KILLIP + HRT + ANT, gusto2) :
> >  Unable to fit model using "lrm.fit"
> >
> > My code is as follows:
> > gusto2 <- gusto[,c(1,3,5,8,9,10)]
> > gusto2$HYP <- factor(gusto2$HYP, labels=c("No","Yes"))
> > gusto2$KILLIP <- factor(gusto2$KILLIP, labels=c("1","2","3","4"))
> > gusto2$HRT <- factor(gusto2$HRT, labels=c("No","Yes"))
> > gusto2$ANT <- factor(gusto2$ANT, labels=c("No","Yes"))
> > var.labels=c(DAY30="30-day Mortality", AGE="Age in Years",
> KILLIP="Killip Class", HYP="Hypertension", HRT="Tachycardia", ANT="Anterior
> Infarct Location")
> > label(gusto2)=lapply(names(var.labels),function(x)
> label(gusto2[,x])=var.labels[x])
> >
> > ddist = datadist(gusto2)
> > options(datadist='ddist')
> >
> > fit1 <- lrm(DAY30~AGE+HYP+KILLIP+HRT+ANT,gusto2)
> >
> > Error in lrm(DAY30 ~ AGE + HYP + KILLIP + HRT + ANT, gusto2) :
> >  Unable to fit model using "lrm.fit"
> >
> > Online solutions to this problem involve checking whether any variables
> are redundant.  However, the results for my data suggest  that none are.
> > redun(~AGE+HYP+KILLIP+HRT+ANT,gusto2)
> >
> > Redundancy Analysis
> >
> > redun(formula = ~AGE + HYP + KILLIP + HRT + ANT, data = gusto2)
> >
> > n: 2188         p: 5    nk: 3
> >
> > Number of NAs:   0
> >
> > Transformation of target variables forced to be linear
> >
> > R-squared cutoff: 0.9   Type: ordinary
> >
> > R^2 with which each variable can be predicted from all other variables:
> >
> >   AGE    HYP KILLIP    HRT    ANT
> > 0.028  0.032  0.053  0.046  0.040
> >
> > No redundant variables
> >
> > I've also tried just considering "lrm.fit" and that code seems to run
> without error too:
> > lrm.fit(cbind(gusto2$AGE,gusto2$KILLIP,gusto2$HYP,
> gusto2$HRT,gusto2$ANT),gusto2$DAY30)
> >
> > Logistic Regression Model
> >
> > lrm.fit(x = cbind(gusto2$AGE, gusto2$KILLIP, gusto2$HYP, gusto2$HRT,
> >     gusto2$ANT), y = gusto2$DAY30)
> >
> >                       Model Likelihood     Discrimination    Rank
> Discrim.
> >                          Ratio Test           Indexes           Indexes
> > Obs          2188    LR chi2     233.59    R2       0.273    C
>  0.846
> >  0           2053    d.f.             5    g        1.642    Dxy
>  0.691
> >  1            135    Pr(> chi2) <0.0001    gr       5.165    gamma
>  0.696
> > max |deriv| 4e-09                          gp       0.079    tau-a
>  0.080
> >                                            Brier    0.048
> >
> >           Coef     S.E.   Wald Z Pr(>|Z|)
> > Intercept -13.8515 0.9694 -14.29 <0.0001
> > x[1]        0.0989 0.0103   9.58 <0.0001
> > x[2]        0.9030 0.1510   5.98 <0.0001
> > x[3]        1.3576 0.2570   5.28 <0.0001
> > x[4]        0.6884 0.2034   3.38 0.0007
> > x[5]        0.6327 0.2003   3.16 0.0016
> >
> > I was therefore hoping someone would explain why the "lrm" code is
> producing an error message, while "lrm.fit" and "glm" do not.  In
> particular I would welcome a solution to ensure I can produce a nomogram.
>
> Try this:
>
> lrm  # look at code, do a search on "fail"
> ?lrm.fit  # read the structure of the returned value of lrm.fit
>
> my.fit <- lrm.fit(x = cbind(gusto2$AGE, gusto2$KILLIP, gusto2$HYP,
> gusto2$HRT,
>     gusto2$ANT), y = gusto2$DAY30)
>
> print(my.fit$fail)  # the error message you got from the lrm call means
> convergence failed
>
> Documentation bug: The documentation of the cause of the 'fail'- value
> incorrectly gives the name of this parameter as 'maxiter' in the  Value
> section.
>
> --
> David.
>
>
>
> >
> > Kind regards,
> > Laura
> >
> > Dr Laura Bonnett
> > NIHR Post-Doctoral Fellow
> >
> > Department of Biostatistics,
> > Waterhouse Building, Block F,
> > 1-5 Brownlow Street,
> > University of Liverpool,
> > Liverpool,
> > L69 3GL
> >
> > 0151 795 9686
> > L.J.Bonnett at liverpool.ac.uk
> >
> >
> >
> >       [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://na01.safelinks.protection.outlook.com/?url=
> https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-help&
> data=02%7C01%7Cf.harrell%40vanderbilt.edu%7Cadb58b13c3994f89209708d4fb88
> 07f0%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%
> 7C636410009046132507&sdata=GAPis8GXCfundLz48dX66AZfVTzxs%
> 2BNBUmG1kgpx2Ro%3D&reserved=0
> > PLEASE do read the posting guide https://na01.safelinks.
> protection.outlook.com/?url=http%3A%2F%2Fwww.R-project.
> org%2Fposting-guide.html&data=02%7C01%7Cf.harrell%40vanderbilt.edu%
> 7Cadb58b13c3994f89209708d4fb8807f0%7Cba5a7f39e3be4ab3b45067fa80fa
> ecad%7C0%7C0%7C636410009046132507&sdata=C8xd7UizYeLM6bylOyad8bumQTsYOz
> FYZu2IcMo%2BUII%3D&reserved=0
> > and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius
> Alameda, CA, USA
>
> 'Any technology distinguishable from magic is insufficiently advanced.'
>  -Gehm's Corollary to Clarke's Third Law
>
>
>
>
>
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list