[R] Conditional logistic regression for "events/trials" format
Strickland, Matthew (CDC/CCHP/NCBDDD) (CTR)
cro6 at CDC.GOV
Thu May 31 19:58:13 CEST 2007
Thanks for your reply Charles. I do indeed have other variables. I
apologize for being vague, here is my study in more detail:
I have a cohort of births. My outcome is a dichotomous variable for
presence/absence of a birth defect. For each cohort member I estimate
the date of conception, and assign a pollution level during the relevant
period of gestation. All cohort members conceived on the same day are
assigned the same pollution level. These cohort members also have a
covariate, t, which indicates the day of follow-up. For example, if the
first day of my study is Jan 1, 1987, the data would look like:
Date t Conceptions Cases
Jan 1, 1987 1 100 1
Jan 2, 1987 2 105 0
Jan 3, 1987 3 101 1
Jan 1, 1988 366 109 1
Jan 2, 1988 367 111 2
Jan 3, 1988 368 103 0
I make matched pairs of days (Strata) to control for the influence of
season. I also want to account for long-term trends, eg increasing birth
defects ascertainment and decreasing pollution levels over time, so I
want to fit a cubic spline using the variable t.
I have already analyzed this data as a time series (I don't use the
Stratum variable in the time-series analyses), but now I am exploring
some alternatives. My full dataset has 3,115 strata.
So my final model would look like: clogit(Cases/Conceptions ~ Pollution
+ f(t) + strata(Stratum)).
So, just to reiterate, my goal is to make this model without having to
bring in the individual-level data. I would be just as happy to do a
conditional Poisson as I would be to do a conditional logistic
regression - either would seem to be appropriate here - if that opens up
some other options.
Thanks very much for your time and interest,
Birth Defects Branch
U.S. Centers for Disease Control and Prevention
From: Charles C. Berry [mailto:cberry at tajo.ucsd.edu]
Sent: Thursday, May 31, 2007 1:12 PM
To: Strickland, Matthew (CDC/CCHP/NCBDDD) (CTR)
Cc: r-help at stat.math.ethz.ch; tlumley at u.washington.edu
Subject: Re: [R] Conditional logistic regression for "events/trials"
On Thu, 31 May 2007, Strickland, Matthew (CDC/CCHP/NCBDDD) (CTR) wrote:
> Dear R users,
> I have a large individual-level dataset (~700,000 records) which I am
> performing a conditional logistic regression on. Key variables include
> the dichotomous outcome, dichotomous exposure, and the stratum to
> which each person belongs.
> Using this individual-level dataset I can successfully use clogit to
> create the model I want. However reading this large .csv file into R
> and running the models takes a fair amount of time.
> Alternatively, I could choose to "collapse" the dataset so that each
> row has the number of events, number of individuals, and the exposure
> and stratum. In SAS they call this the "events/trials" format. This
> would make my dataset much smaller and presumably speed things up.
I think you have described the data for forming a 2 by 2 by K table of
In which case, loglin(), loglm(), mantelhaen.test(), and - if K is not
too large - glm(... , family=poisson) would be suitable.
But you say 'models' above suggesting that there are some other
variables. If so, you need to be a bit more specific in describing your
> So my question is: can I use clogit (or possibly another function) to
> perform a conditional logistic regression when the data is in this
> "events/trials" format? I am using R version 2.5.0.
> Thank you very much,
> Matt Strickland
> Birth Defects Branch
> U.S. Centers for Disease Control
> R-help at stat.math.ethz.ch mailing list
> PLEASE do read the posting guide
> and provide commented, minimal, self-contained, reproducible code.
Charles C. Berry (858) 534-2098
Dept of Family/Preventive
E mailto:cberry at tajo.ucsd.edu UC San Diego
http://biostat.ucsd.edu/~cberry/ La Jolla, San Diego 92093-0901
More information about the R-help