[R] plm "within" models: is the correct F-statistic reported?

Liviu Andronic landronimirc at gmail.com
Thu Mar 18 01:06:32 CET 2010


On 3/17/10, Achim Zeileis <Achim.Zeileis at uibk.ac.at> wrote:
>  Hmm, that sounds strange. Maybe something about the data pre-processing
> went wrong?
>
I traced plm() in step-by-step mode, and the process stalls on
plm.fit(), apparently after all the pre-processing.


> Depending on how unbalanced the data is, there might not be
> enough observations.
>

It is very unbalanced.
> length(unique(kldall.sync$cusip6[]))  ##nr of individuals
[1] 3079
> summary(x1); sum(x1==1)  ##distribution of T
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
   1.00    2.00    4.00    4.32    5.00   16.00
[1] 560

But this doesn't seem to be an issue for a heavily unbalanced Grunfeld:
> data("Grunfeld", package = "AER")
> gr <- subset(Grunfeld[-c(1:19), ], firm %in% c("General Electric", "General Motors", "IBM"))
> dim(gr); head(gr)
[1] 41  5
   invest  value capital             firm year
20 1486.7 5593.6  2226.3   General Motors 1954
41   33.1 1170.6    97.8 General Electric 1935
42   45.0 2015.8   104.4 General Electric 1936
43   77.2 2803.3   118.0 General Electric 1937
44   44.6 2039.7   156.2 General Electric 1938
45   48.1 2256.2   172.6 General Electric 1939
> pgr <- plm.data(gr, index = c("firm", "year"))
> gr_fe <- plm(invest ~ value + capital, data = pgr, model = "within",
+  effect = "individual")
> gr_fe <- plm(invest ~ value + capital, data = pgr, model = "within",
+  effect = "time")
> gr_fe <- plm(invest ~ value + capital, data = pgr, model = "within",
+  effect = "twoways")
>  summary(gr_fe)
Twoways effects Within Model

Call:
plm(formula = invest ~ value + capital, data = pgr, effect = "twoways",
    model = "within")

Unbalanced Panel: n=3, T=1-20, N=41

Residuals :
     Min.   1st Qu.    Median   3rd Qu.      Max.
-3.01e+01 -7.64e+00 -3.60e-15  7.64e+00  3.01e+01

Coefficients :
        Estimate Std. Error t-value Pr(>|t|)
value     0.0167     0.0165    1.01     0.32
capital   0.0468     0.0367    1.27     0.22

Total Sum of Squares:    6850
Residual Sum of Squares: 6150
F-statistic: 0.959136 on 2 and 17 DF, p-value: 0.403



> Does the lm() version of the "twoways" model work ok?
>
It works when controlling only for time, but doesn't work when
controlling only for individuals or for both (I actually kill the
process after 10-15 min). I guess that here the following applies:
"in cases where there are many individuals in the sample and we are
not interested in the value of their fixed effects, the lm() results
are awkward to deal with and the estimation of a large number of ui
coefficients could render the problem numerically intractable." [1]
[1] http://cran.r-project.org/doc/contrib/Farnsworth-EconometricsInR.pdf

The alternative to dummy controls is time-demeaning of the data, and
according to the "plm" vignette  this is the implementation for
"individual" and "time" cases. I am wondering, though, if "twoways"
uses (or can use?) the same implementation.

One way to work around the failing lm() "twoways" call is to use
plm(..., effect="individual") and manually include the time effect.
>  gr_fe2 <- plm(invest ~ value + capital + year, data = pgr,
+    model = "within", effect="individual")
>  summary(gr_fe2)
Oneway (individual) effect Within Model

Call:
plm(formula = invest ~ value + capital + year, data = pgr, effect =
"individual",
    model = "within")

Unbalanced Panel: n=3, T=1-20, N=41

Residuals :
     Min.   1st Qu.    Median   3rd Qu.      Max.
-3.01e+01 -7.64e+00 -4.06e-15  7.64e+00  3.01e+01

Coefficients :
         Estimate Std. Error t-value Pr(>|t|)
value      0.0167     0.0165    1.01    0.325
capital    0.0468     0.0367    1.27    0.220
year1936   1.2031    20.3416    0.06    0.954
[..]
year1954  92.5546    37.0794    2.50    0.023 *
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Total Sum of Squares:    68100
Residual Sum of Squares: 6150
F-statistic: 8.14666 on 21 and 17 DF, p-value: 0.0000286


> If so, I guess you will have to try to find out whether it's your
> preparation of the data or the fault of plm() that it does not work.
>
This mix-up replicates fine the coefficients of  plm(...,
effect="twoways"), but reports an unexpected F-statistic. Since the
mixed-up specification---plm(..., effect="individual") & year
regressor---works fine on my data, I would suspect that my data is OK
and that the plm(..., effect="twoways") implementation falters
somewhere.


>  And you want fixed effects for all >2000 individuals?
>
No, I don't think so. I am not sure how orthodox this is, but we are
only looking at the coefficients of the "main" regressors, while
controlling for time and individual variation.


>  waldtest() from "lmtest" does work in this context. Furthermore, "plm"
> provides various specialized tests for certain test problems.
>

Finally, it works!
> gr_fe1 <- plm(invest ~ value + capital, data = pgr,
+    model = "within", effect="twoways")
> summary(gr_fe1)$fstatistic

	F test

data:  invest ~ value + capital
F = 0.9591, df1 = 2, df2 = 17, p-value = 0.403
>  gr_fe2 <- plm(invest ~ value + capital + year, data = pgr,
+    model = "within", effect="individual")
> summary(gr_fe2)$fstatistic  ##"incorrect"

	F test

data:  invest ~ value + capital + year
F = 8.1467, df1 = 21, df2 = 17, p-value = 0.00002857
> gr_fe2_null <- plm(invest ~ year, data = pgr, model = "within")
> waldtest(gr_fe2_null, gr_fe2, test="F")  ##works!
Wald test

Model 1: invest ~ year
Model 2: invest ~ value + capital + year
  Res.Df Df    F Pr(>F)
1     19
2     17  2 0.96    0.4

Thanks again
Liviu



More information about the R-help mailing list