[R] Fwd: How to best analyze dataset with zero-inflated loglinear dependent variable?

Fri Jun 8 17:37:54 CEST 2012

Dear netters,

Sorry for cross-posting this question. I am sure R-Help is not a
research methods discussion list, but we have many statisticians in
the list and I would like to hear from them. Any function/package in R
would be able to deal with the problem from this researcher?

---------- Forwarded message ----------
From: Heidi Bertels
Date: Tue, Jun 5, 2012 at 4:31 PM
Subject: How to best analyze dataset with zero-inflated loglinear
dependent variable?
To: RMNET <rmnet at listserv.unc.edu>

Dear colleagues,

I have what I think is an interesting dataset, but I have never
analyzed anything alike. As background: over a 12 week period,
employees developed a business plan and attempted to obtain funding
for their project in teams of 3–5 corporate entrepreneurs. Data of
team members was obtained and aggregated for a total of 39 teams (we
have the same data for the executive champions of the teams). The
funding amount obtained is the dependent variable for the study. As
independent variables, there are a series of criteria (e.g., venture
team understood the business aspects) and obstacles (e.g., competition
for resources within the company) that every team member rated in
terms of how essential it was to the funding decision (or how
significant an obstacle it was). Each employee at the end of the
twelve week period was asked to evaluate these on a 5-point scale.

The dependent variable is "problematic" because of its distribution.
It is zero-inflated (many projects did not receive funding), and for
those projects that did receive funding, the distribution is
loglinear. I believe this is called a zero-inflated loglinear
continuous dependent variable. I can't use regular regression analysis
because the assumption of linearity is violated. Does anyone have any
ideas?

I have already done independent t-tests, but would like to go much
further. I could split up the sample in two ways. First "funded/not
funded" (logistic regression) and then funded only with funding amount
(assuming there might also be differences between projects that
received high versus low funding), but then the N gets even smaller...

Thank in advance,

Heidi Bertels
University of Pittsburgh