[R] Regression lines for differently-sized groups on the same plot

Laura M Marx marxlau1 at msu.edu
Wed Jul 20 01:53:23 CEST 2005

```Hi there,
plots in the r-help archive, and can't find a post where someone has offered
a solution for my specific problem.  I need to plot logistic regression fits
from three differently-sized data subsets on a plot of the entire dataset.
A description and code are below:
I have an unbalanced dataset consisting of three different species (hem,
yb, and sm), with unequal numbers of wood pieces in each species group.  I
am trying to generate a plot that will show the size of the wood piece on
the X axis, the probability of it having tree seedlings growing on it on the
Y (a binomial yes or no variable), and three fitted curves showing how the
probability of having tree seedlings changes with increasing wood piece size
for each species.
I have no problem generating fits using GLM, and no problem creating the
plot.  However, if I try to add a fitted curve based only on the hem data
subset to a plot that shows the entire dataset, I get an error message that
the lengths of those data sets differ. "Error in xy.coords(x,y) : x and y
lengths differ".  I could see R's point -- you can't plot a regression line
of babies born as a function of stork abundance on a graph of cherries
produced (Y) versus rainfall (X), which for all the program knows, I'm
trying to do.  As a temporary fix, I added NAs to the end of the hem, yb,
and sm subsets to make them the same length as the entire dataset.  I can
now add my fitted curves to the plot, but the lines are not connected.  That
is, if the hem group only contains wood pieces that are 1, 4, and 10 meters
long, the plot has an X axis that ranges from 1 to 10, but line segments for
the hem group regression line only appear above 1, 4, and 10.  How can I fix
this?  An ideal solution would not require me to make the hem subset of my
data the same length as the full dataset, either (although the summaries of
regressions with the NAs (or zeroes) added and taken away are identical).
I'd also settle for a work-around that would have R connect the pieces of
the curve so that I get a solid line rather than small dots and dashes where
actual data exist.  Thanks so much for your help!
Laura Marx
Michigan State University, Dept. of Forestry

#Note: hemdata has all the rows that are not hemlock species replaced with
#"NA"s.
hemhem=glm(hempresence~logarea, family=binomial(logit), data=hemdata)
hemyb=glm(hempresence~logarea, family=binomial(logit), data=birchdata)
hemsm=glm(hempresence~logarea, family=binomial(logit), data=mapledata)

attach(logreg) #logreg is the full dataset
plot(logarea, hempresence, xlab = "Surface area of log (m2)",
ylab="Probability of hemlock seedling presence", type="n", font.lab=2,
cex.lab=1.5, axes=TRUE)
lines(logarea,fitted(hemhem), lty=1, lwd=2)
lines(logarea,fitted(hemyb), lty="dashed", lwd=2)
lines(logarea,fitted(hemsm), lty="dotted", lwd=2)

```