[R] fixed effects with clustered standard errors

Sat Feb 11 11:35:13 CET 2012

Dear Giovanni,

I recalled the procedure and here is the output :

Erreur : impossible d'allouer un vecteur de taille 39.2 Mo
De plus : Messages d'avis :
1: In structure(list(message = as.character(message), call = call),  :
  Reached total allocation of 12279Mb: see help(memory.size)
2: In structure(list(message = as.character(message), call = call),  :
  Reached total allocation of 12279Mb: see help(memory.size)
3: In structure(list(message = as.character(message), call = call),  :
  Reached total allocation of 12279Mb: see help(memory.size)
4: In structure(list(message = as.character(message), call = call),  :
  Reached total allocation of 12279Mb: see help(memory.size)
>
> traceback()
Pas d'historique des fonctions appelées ('traceback') disponible

I had a similar problem with another stat software but in a different
context : when I tried to fit a fixed effect logit model.
The soft did not converge because, as you rightly guessed at the
beginning of this thread, the number of points for some individuals is
too high.

This might be the source of error here, probably?
I recall that the median number of points in my database is quite low
at 10, but I have individuals with more than 2000, 5000 or even 50 000
points!

What do you think ?

Many thanks
Best,

On 9 February 2012 10:31, John L <cariboupad at gmx.fr> wrote:
> Dear Giovanni,
>
> Many thanks for your interesting suggestions.
> Your guess is indeed right, I only use the 'within' fixed effects specification.
>
> I will soon send to this list all the additional information you
> requested in order to understand what might cause this problem, but I
> would say as a first guess that the inefficiency is (probably?) due to
> individuals with too many datapoints : the median number of points is
> 10, but I have some individuals with more than 1000, 5000 or even 80
> 000 points overall!
> So basically my dataset is probably too strange, as you suggested,
> compared to the "standard" panel dataset in social sciences...
>
> To be continued... ;-)
> Many thanks again
>
> Best,
>
> On 8 February 2012 18:55, Millo Giovanni [via R]
> <ml-node+s789695n4370302h35 at n4.nabble.com> wrote:
>> Dear John,
>>
>> interesting. There must be a bottleneck somewhere, which possibly went
>> unnoticed because econometricians seldom use so many data points. In
>> fact 'plm' wasn't designed to handle "only" 700 Megs of data at a time;
>> but we're happy to investigate in this direction too. E.g., I was aware
>> of some efficiency problems if effect="twoways" but I seem to understand
>> that you are using effect="individual"? --> which takes me to the main
>> point.
>>
>> I understand that enclosing the data for a reproducible report, as
>> requested by the posting guide, is awkward for such a big dataset. Yet
>> it would be of great help if you at least produced:
>>
>> - an output of your procedure, in order to see what goes wrong and where
>> - the output of traceback() called immediately after you got the error
>> (idem)
>>
>> and possibly gave it a try with lm() applied to the very same formula
>> and data, maybe into a system.time( ... ) statement.
>>
>> Else, the information you provide is way too scant to even make an
>> educated guess. For example, it isn't clear whether the problem is
>> related to plm() or to vcovHC.plm etc.
>>
>> As far as "simple demeaning" is concerned, you might try the following
>> code, which really does only that. Be aware that **standard errors are
>> biased** etc. etc., this is not meant to be a proper function but just a
>> computational test for your data and a quick demonstration of demeaning.
>> 'plm()' is far more structured, for a number of reasons. Please execute
>> it inside system.time() again.
>>
>> ######### test function for within model, BIASED SEs !! #############
>> ##
>> ## ## example:
>> ## data(Produc, package="plm")
>> ## mod <- FEmod(log(gsp) ~ log(pcap) + log(pc) + log(emp) + unemp,
>> index=Produc$state, data=Produc)
>> ## summary(mod)
>> ## ## compare with:
>> ## library(plm)
>> ## example(plm)
>>
>> demean<-function(x,index,lambda=1,na.rm=F) {
>>
>> as.vector(x-lambda*tapply(x,index,mean,na.rm=na.rm)[unclass(as.factor(in
>> dex))])
>>   }
>> FEmod<-function(formula,index,data=ls()) {
>>
>>   ## fit a model without intercept in any case
>>   formula<-as.formula(paste(deparse(formula(formula)),"-1",sep=""))
>>   X<-model.matrix(formula,data=data)
>>   y<-model.response(model.frame(formula,data=data))
>>   ## reduce index accordingly
>>   names(index)<-row.names(data)
>>   ind<-index[which(names(index)%in%row.names(X))]
>>
>>   ## within transf.
>>   MX<-matrix(NA,ncol=dim(X)[[2]],nrow=dim(X)[[1]])
>>   for(i in 1:dim(X)[[2]]) {
>>     MX[,i]<-demean(X[,i],index=ind,lambda=1)
>>     }
>>   My<-demean(y,index=ind,lambda=1)
>>
>>   ## estimate within model
>>   femod<-lm(My~MX-1)
>>
>>   return(femod)
>> }
>> ####### end test function ########
>>
>>
>> Best,
>> Giovanni
>>
>> ########### original message #############
>>
>> ------------------------------
>>
>> Message: 28
>> Date: Tue, 07 Feb 2012 15:35:07 +0100
>> From: [hidden email]
>> To: [hidden email]
>> Subject: [R] fixed effects with clustered standard errors
>> Message-ID: <[hidden email]>
>> Content-Type: text/plain; charset="utf-8"
>>
>> Dear R-helpers,
>>
>> I have a very simple question and I really hope that someone could help
>> me
>>
>> I would like to estimate a simple fixed effect regression model with
>> clustered standard errors by individuals.
>> For those using Stata, the counterpart would be xtreg with the "fe"
>> option, or areg with the "absorb" option and in both case the clustering
>> is achieved with "vce(cluster id)"
>>
>> My question is : how could I do that with R ? An important point is that
>> I have too many individuals, therefore I cannot include dummies and
>> should use the demeaning "usual" procedure.
>> I tried with the plm package with the "within" option, but R quikcly
>> tells me that the memory limits are attained (I have over 10go ram!)
>> while the dataset is only 700mo (about 50 000 individuals, highly
>> unbalanced)
>> I dont understand... plm do indeed demean the data so the computation
>> should be fast and light enough... ?!
>>
>> Are there any other solutions ?
>> Many thanks in advance ! ;)
>> John
>>
>>
>> ############ end original message ############
>> Giovanni Millo, PhD
>> Research Dept.,
>> Assicurazioni Generali SpA
>> Via Machiavelli 4,
>> 34132 Trieste (Italy)
>> tel. +39 040 671184
>> fax  +39 040 671160
>>
>>
>> Ai sensi del D.Lgs. 196/2003 si precisa che le informazi...{{dropped:12}}
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>> ________________________________
>> If you reply to this email, your message will be added to the discussion
>> below:
>> http://r.789695.n4.nabble.com/fixed-effects-with-clustered-standard-errors-tp4364856p4370302.html
>> To unsubscribe from fixed effects with clustered standard errors, click
>> here.
>> NAML
>
>
> --
> View this message in context: http://r.789695.n4.nabble.com/fixed-effects-with-clustered-standard-errors-tp4364856p4372305.html
> Sent from the R help mailing list archive at Nabble.com.
>        [[alternative HTML version deleted]]
>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>