[R] Conditional logistic regression for "events/trials" format

Strickland, Matthew (CDC/CCHP/NCBDDD) (CTR) cro6 at CDC.GOV
Thu May 31 19:58:13 CEST 2007

Thanks for your reply Charles. I do indeed have other variables. I
apologize for being vague, here is my study in more detail:

I have a cohort of births. My outcome is a dichotomous variable for
presence/absence of a birth defect. For each cohort member I estimate
the date of conception, and assign a pollution level during the relevant
period of gestation. All cohort members conceived on the same day are
assigned the same pollution level. These cohort members also have a
covariate, t, which indicates the day of follow-up. For example, if the
first day of my study is Jan 1, 1987, the data would look like:

Date			t	Conceptions		Cases
Pollution	Stratum
Jan 1, 1987		1	100			1
10		1
Jan 2, 1987		2	105			0
8		2
Jan 3, 1987		3	101			1
11		3
Jan 1, 1988		366	109			1
13		1
Jan 2, 1988		367	111			2
19		2
Jan 3, 1988		368	103			0
14		3

I make matched pairs of days (Strata) to control for the influence of
season. I also want to account for long-term trends, eg increasing birth
defects ascertainment and decreasing pollution levels over time, so I
want to fit a cubic spline using the variable t. 

I have already analyzed this data as a time series (I don't use the
Stratum variable in the time-series analyses), but now I am exploring
some alternatives. My full dataset has 3,115 strata.

So my final model would look like: clogit(Cases/Conceptions ~ Pollution
+ f(t) + strata(Stratum)). 

So, just to reiterate, my goal is to make this model without having to
bring in the individual-level data. I would be just as happy to do a
conditional Poisson as I would be to do a conditional logistic
regression - either would seem to be appropriate here - if that opens up
some other options.

Thanks very much for your time and interest,
Matt Strickland
Birth Defects Branch
U.S. Centers for Disease Control and Prevention


-----Original Message-----
From: Charles C. Berry [mailto:cberry at tajo.ucsd.edu] 
Sent: Thursday, May 31, 2007 1:12 PM
To: Strickland, Matthew (CDC/CCHP/NCBDDD) (CTR)
Cc: r-help at stat.math.ethz.ch; tlumley at u.washington.edu
Subject: Re: [R] Conditional logistic regression for "events/trials"

On Thu, 31 May 2007, Strickland, Matthew (CDC/CCHP/NCBDDD) (CTR) wrote:

> Dear R users,
> I have a large individual-level dataset (~700,000 records) which I am 
> performing a conditional logistic regression on. Key variables include

> the dichotomous outcome, dichotomous exposure, and the stratum to 
> which each person belongs.
> Using this individual-level dataset I can successfully use clogit to 
> create the model I want. However reading this large .csv file into R 
> and running the models takes a fair amount of time.
> Alternatively, I could choose to "collapse" the dataset so that each 
> row has the number of events, number of individuals, and the exposure 
> and stratum. In SAS they call this the "events/trials" format. This 
> would make my dataset much smaller and presumably speed things up.

I think you have described the data for forming a 2 by 2 by K table of

In which case, loglin(), loglm(), mantelhaen.test(), and - if K is not
too large - glm(... , family=poisson)  would be suitable.

But you say 'models' above suggesting that there are some other
variables. If so, you need to be a bit more specific in describing your

> So my question is: can I use clogit (or possibly another function) to 
> perform a conditional logistic regression when the data is in this 
> "events/trials" format? I am using R version 2.5.0.
> Thank you very much,
> Matt Strickland
> Birth Defects Branch
> U.S. Centers for Disease Control
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Charles C. Berry                        (858) 534-2098
                                          Dept of Family/Preventive
E mailto:cberry at tajo.ucsd.edu	         UC San Diego
http://biostat.ucsd.edu/~cberry/         La Jolla, San Diego 92093-0901

More information about the R-help mailing list