[R] model syntax processed --- probably common

Mon Aug 19 22:47:17 CEST 2013

When I want to manipulate expressions, including formulae, I first
think of things like bquote() and substitute().   E.g.,

> for(nm in lapply(c("x","z"), as.name)) {
        fmla <- formula( bquote( y ~ .(nm) )) 
        print(fama.macbeth(fmla, din=d))
}
(Intercept)           x
-0.02384804  0.18151577
(Intercept)           z
 0.05562026  0.03174173

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com

> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf
> Of ivo welch
> Sent: Monday, August 19, 2013 1:05 PM
> To: David Winsemius; r-help
> Subject: Re: [R] model syntax processed --- probably common
> 
> thank you.  but uggh...sorry for my html post.  and sorry again for
> having been obscure in my attempt to be brief.  here is a working
> program.
> 
> fama.macbeth <- function( formula, din ) {
>   fnames <- terms( formula )
>   dnames <- names( din )
>   stopifnot( all(dimnames(attr(fnames, "factors"))[[1]] %in%  dnames) )
> 
>   monthly.regressions <- by( din, as.factor(din$month), function(dd)
> coef(lm(model.frame( formula, data=dd ))))
>   as.m <- do.call("rbind", monthly.regressions)
>   colMeans(as.m)
> }
> 
> ## a test data set
> d <- data.frame( month=rep(1:5,10), y= rnorm(50), x= rnorm(50), z=rnorm(50) )
> 
> ## this works beautifully, exactly how I want it.  the names are
> there, the formula works.
> print(fama.macbeth( y ~ x , din=d ))
> 
> ## now I want something like the following statement to work, too
> for (nm in c("x")) print(fama.macbeth( y ~ nm, din=d ))
>    or
> for (nm in c("x")) print(fama.macbeth( y ~ d[[nm]], din=d ))
>   or whatever.
> 
> the output in both cases should be the same, preferably even knowing
> that the name of the variable is really "x" and not nm.  is there a
> standard common way to do this?
> 
> regards,
> 
> /iaw
> 
> ----
> Ivo Welch (ivo.welch at gmail.com)
> http://www.ivo-welch.info/
> J. Fred Weston Professor of Finance
> Anderson School at UCLA, C519
> Director, UCLA Anderson Fink Center for Finance and Investments
> Free Finance Textbook, http://book.ivo-welch.info/
> Editor, Critical Finance Review, http://www.critical-finance-review.org/
> 
> 
> 
> On Mon, Aug 19, 2013 at 12:48 PM, David Winsemius
> <dwinsemius at comcast.net> wrote:
> >
> > On Aug 19, 2013, at 9:45 AM, ivo welch wrote:
> >
> >> dear R experts---I was programming a fama-macbeth panel regression (a
> >> fama-macbeth regression is essentially T cross-sectional regressions, with
> >> statistics then obtained from the time-series of coefficients), partly
> >> because I wanted faster speed than plm, partly because I wanted some
> >> additional features.
> >>
> >> my function starts as
> >>
> >> fama.macbeth <- function( formula, din ) {
> >>   names <- terms( formula )
> >>  ## omitted : I want an immediate check that the formula refers to
> >> existing variables in the data frame with English error messages
> >>
> >
> > Look the structure of a terms result from a formula argument with str():
> >
> >  fama.macbeth <- function( formula, din ) {
> >    fnames <- terms( formula ) ; str(fnames)
> >  }
> >
> >> fama.macbeth( x ~ y, data.frame(x=rnorm(10), y=rnorm(10) ) )
> > Classes 'terms', 'formula' length 3 x ~ y
> >   ..- attr(*, "variables")= language list(x, y)
> >   ..- attr(*, "factors")= int [1:2, 1] 0 1
> >   .. ..- attr(*, "dimnames")=List of 2
> >   .. .. ..$ : chr [1:2] "x" "y"
> >   .. .. ..$ : chr "y"
> >   ..- attr(*, "term.labels")= chr "y"
> >   ..- attr(*, "order")= int 1
> >   ..- attr(*, "intercept")= int 1
> >   ..- attr(*, "response")= int 1
> >   ..- attr(*, ".Environment")=<environment: R_GlobalEnv>
> >
> > Then extract the dimnames from the "factors" attribute to compare to the names in
> hte data-object:
> >
> >> fama.macbeth <- function( formula, din ) {
> >   fnames <- terms( formula ) ;  dnames <- names( din)
> >   dimnames(attr(fnames, "factors"))[[1]] %in%  dnames
> > }
> > #[1] TRUE TRUE
> >
> >
> > I couldn't tell if this was the main thrust of you question. It seems to meander a bit.
> >
> > --
> > David.
> >
> >> monthly.regressions <- by( din, as.factor(din$month), function(dd)
> >> coef(lm(model.frame( formula, data=dd )))
> >>   as.m <- do.call("rbind", monthly.regressions)
> >>   colMeans(as.m)  ## or something like this.
> >> }
> >> say my data frame mydata has columns named month, r, laggedx and ... .  I
> >> can call this function
> >>
> >>   fama.macbeth( r ~ laggedx, din=mydata )
> >>
> >> but it fails
> >
> > What fails?
> >
> >
> >> if I want to compute my x variables.  for example,
> >>
> >>   myx <- d[,"laggedx"]
> >>   fama.macbeth( r ~ myx)
> >>
> >> I also wish that the computed myx still remembered that it was really
> >> laggedx.  it's almost as if I should not create a vector myx but a data
> >> frame myx to avoid losing the column name.
> >
> > I wouldn't say "almost"... rather that is exactly what you should do. R regression
> methods almost always work better when formulas are interpreted in the environment of
> the data argument.
> >
> >>  I wonder why such vectors don't
> >> keep a name attribute of some sort.
> >>
> >> there is probably an "R way" of doing this.  is there?
> >>
> >> /iaw
> >>
> >> ----
> >> Ivo Welch (ivo.welch at gmail.com)
> >>
> >>       [[alternative HTML version deleted]]
> >
> > Still posting HTML?
> >
> >>
> >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >
> > And do explain what the goal is.
> >
> > --
> >
> > David Winsemius
> > Alameda, CA, USA
> >
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.