[R] non-linear estimation with many firm-specific parameters

ivo welch ivowel at gmail.com
Sun May 9 22:14:17 CEST 2010

Dear R experts---

I doubt that someone has already solved my problem, but I thought I
would ask quickly, just in case someone has.

Let' say I start with a (flattened panel) model that says
    y[i] = x[i] + b*(T-x[i])
easy enough---this is just a linear model.  I could also make this a
fixed-effects model if I change T to T[fmid], where fmid is the firm's
id.  I know I can do this faster, but logically, what I want to
estimate is lm( y ~ as.factor(fmid) + x ).  I have about 100,000
observations, and about 10,000 firm ids.

now, let me move to a world in which b is a function of the distance
between T and x, b= a+c*(T-x[i])^2
   y[i] = x[i] + b(T,x[i]) * (T-x[i]) =  x[i] + (a+c*(T-x[i])^2) * (T-x[i])
R solves this nicely with the nls() function in about 5 seconds.  The
result are estimates for a, c, and T.

here comes the hard part.  I want to make the T again a function of
each firm, i.e., T[fmid].  in a sense, I want

   y[i] = x[i] + (a+c*(T[fmid]-x[i])^2) * (T[fmid] - x[i])

where the firm-specific constants are supposed to be the same in the
two terms (i.e., not the permutative set).  the usual trick to speed
up fixed-effects estimations (i.e., subtracting out the means) does
not work here, because the problem is non-linear.  I am thinking about
expanding the dummies into an appropriate matrix, then coding my
problem into an objective function, and letting R optimize over my,
ahem, 10,000 or so T[i], a, and b.  I fear that this would not only
overwhelm my CPU (taking a few days, which would be ok), but overwhelm
my memory, too.  maybe it is just plain infeasible.

has anyone seen someone else work on such a problem?


/ivo welch

Ivo Welch (ivo.welch at brown.edu, ivo.welch at gmail.com)

More information about the R-help mailing list