[R] model syntax processed --- probably common

ivo welch ivo.welch at gmail.com
Mon Aug 19 22:05:26 CEST 2013


thank you.  but uggh...sorry for my html post.  and sorry again for
having been obscure in my attempt to be brief.  here is a working
program.

fama.macbeth <- function( formula, din ) {
  fnames <- terms( formula )
  dnames <- names( din )
  stopifnot( all(dimnames(attr(fnames, "factors"))[[1]] %in%  dnames) )

  monthly.regressions <- by( din, as.factor(din$month), function(dd)
coef(lm(model.frame( formula, data=dd ))))
  as.m <- do.call("rbind", monthly.regressions)
  colMeans(as.m)
}

## a test data set
d <- data.frame( month=rep(1:5,10), y= rnorm(50), x= rnorm(50), z=rnorm(50) )

## this works beautifully, exactly how I want it.  the names are
there, the formula works.
print(fama.macbeth( y ~ x , din=d ))

## now I want something like the following statement to work, too
for (nm in c("x")) print(fama.macbeth( y ~ nm, din=d ))
   or
for (nm in c("x")) print(fama.macbeth( y ~ d[[nm]], din=d ))
  or whatever.

the output in both cases should be the same, preferably even knowing
that the name of the variable is really "x" and not nm.  is there a
standard common way to do this?

regards,

/iaw

----
Ivo Welch (ivo.welch at gmail.com)
http://www.ivo-welch.info/
J. Fred Weston Professor of Finance
Anderson School at UCLA, C519
Director, UCLA Anderson Fink Center for Finance and Investments
Free Finance Textbook, http://book.ivo-welch.info/
Editor, Critical Finance Review, http://www.critical-finance-review.org/



On Mon, Aug 19, 2013 at 12:48 PM, David Winsemius
<dwinsemius at comcast.net> wrote:
>
> On Aug 19, 2013, at 9:45 AM, ivo welch wrote:
>
>> dear R experts---I was programming a fama-macbeth panel regression (a
>> fama-macbeth regression is essentially T cross-sectional regressions, with
>> statistics then obtained from the time-series of coefficients), partly
>> because I wanted faster speed than plm, partly because I wanted some
>> additional features.
>>
>> my function starts as
>>
>> fama.macbeth <- function( formula, din ) {
>>   names <- terms( formula )
>>  ## omitted : I want an immediate check that the formula refers to
>> existing variables in the data frame with English error messages
>>
>
> Look the structure of a terms result from a formula argument with str():
>
>  fama.macbeth <- function( formula, din ) {
>    fnames <- terms( formula ) ; str(fnames)
>  }
>
>> fama.macbeth( x ~ y, data.frame(x=rnorm(10), y=rnorm(10) ) )
> Classes 'terms', 'formula' length 3 x ~ y
>   ..- attr(*, "variables")= language list(x, y)
>   ..- attr(*, "factors")= int [1:2, 1] 0 1
>   .. ..- attr(*, "dimnames")=List of 2
>   .. .. ..$ : chr [1:2] "x" "y"
>   .. .. ..$ : chr "y"
>   ..- attr(*, "term.labels")= chr "y"
>   ..- attr(*, "order")= int 1
>   ..- attr(*, "intercept")= int 1
>   ..- attr(*, "response")= int 1
>   ..- attr(*, ".Environment")=<environment: R_GlobalEnv>
>
> Then extract the dimnames from the "factors" attribute to compare to the names in hte data-object:
>
>> fama.macbeth <- function( formula, din ) {
>   fnames <- terms( formula ) ;  dnames <- names( din)
>   dimnames(attr(fnames, "factors"))[[1]] %in%  dnames
> }
> #[1] TRUE TRUE
>
>
> I couldn't tell if this was the main thrust of you question. It seems to meander a bit.
>
> --
> David.
>
>> monthly.regressions <- by( din, as.factor(din$month), function(dd)
>> coef(lm(model.frame( formula, data=dd )))
>>   as.m <- do.call("rbind", monthly.regressions)
>>   colMeans(as.m)  ## or something like this.
>> }
>> say my data frame mydata has columns named month, r, laggedx and ... .  I
>> can call this function
>>
>>   fama.macbeth( r ~ laggedx, din=mydata )
>>
>> but it fails
>
> What fails?
>
>
>> if I want to compute my x variables.  for example,
>>
>>   myx <- d[,"laggedx"]
>>   fama.macbeth( r ~ myx)
>>
>> I also wish that the computed myx still remembered that it was really
>> laggedx.  it's almost as if I should not create a vector myx but a data
>> frame myx to avoid losing the column name.
>
> I wouldn't say "almost"... rather that is exactly what you should do. R regression methods almost always work better when formulas are interpreted in the environment of the data argument.
>
>>  I wonder why such vectors don't
>> keep a name attribute of some sort.
>>
>> there is probably an "R way" of doing this.  is there?
>>
>> /iaw
>>
>> ----
>> Ivo Welch (ivo.welch at gmail.com)
>>
>>       [[alternative HTML version deleted]]
>
> Still posting HTML?
>
>>
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> And do explain what the goal is.
>
> --
>
> David Winsemius
> Alameda, CA, USA
>



More information about the R-help mailing list