[Rd] suggested modification to the 'mle' documentation?

Luke Tierney luke at stat.uiowa.edu
Fri Dec 7 20:38:34 CET 2007

On Fri, 7 Dec 2007, Gabor Grothendieck wrote:

> On Dec 7, 2007 8:10 AM, Peter Dalgaard <P.Dalgaard at biostat.ku.dk> wrote:
>> Ben Bolker wrote:
>>>   At this point I'd just like to advertise the "bbmle" package
>>> (on CRAN) for those who respectfully disagree, as I do, with Peter over
>>> this issue.  I have added a data= argument to my version
>>> of the function that allows other variables to be passed
>>> to the objective function.  It seems to me that this is perfectly
>>> in line with the way that other modeling functions in R
>>> behave.
>> This is at least cleaner than abusing the "fixed" argument. As you know,
>> I have reservations, one of which is that it is not a given that I want
>> it to behave just like other modeling functions, e.g. a likelihood
>> function might refer to more than one data set, and/or data that are not
>> structured in the traditional data frame format. The design needs more
>> thought than just adding arguments.
>> I still prefer a design based a plain likelihood function. Then we can
>> discuss how to construct such a function so that  the data are
>> incorporated in a flexible way.  There are many ways to do this, I've
>> shown one, here's another:
>>> f <- function(lambda) -sum(dpois(x, lambda, log=T))
>>> d <- data.frame(x=rpois(10000, 12.34))
>>> environment(f)<-evalq(environment(),d)
>>> mle(f, start=list(lambda=10))
>> Call:
>> mle(minuslogl = f, start = list(lambda = 10))
>> Coefficients:
>>  lambda
>> 12.3402
> The explicit environment manipulation is what I was referring to but

I make extensive use of lexical scoping in my programming and I NEVER
use explicit environment manipulaiton--for me that is unreadable, and it
is not amenable to checking using things like codetools.  In the
example above all that is needed is to define x directly, e.g.

     > f <- function(lambda) -sum(dpois(x, lambda, log=T))
     > x <- rpois(10000, 12.34)
     > mle(f, start=list(lambda=10))

     mle(minuslogl = f, start = list(lambda = 10))


It isn't necessary to go through the data frame or environment
munging.  If you want to be able to work with likelihoods for several
data sets at once then you can either use diferent names for the
variables, like

     x1 <- rpois(10000, 12.34)
     f1 <- function(lambda) -sum(dpois(x1, lambda, log=T))
     x2 <- rpois(10000, 12.34)
     f2 <- function(lambda) -sum(dpois(x2, lambda, log=T))

If you are concerned that x, x1, x2 might have been redefined if you
come back to f1, f2 later (not an issue with typical useage inside a
function but can be an issue at top level) then you can create a
closure that cpatures the particular data set you are using.  The
clean way to do this is with a function that creates the negative log
likelihood, e.g.

     makePoisonNegLogLikelihood <- function(x)
 	function(lambda) -sum(dpois(x, lambda, log=T))

Then you can do

     f <- makePoisonNegLogLikelihood(rpois(10000, 12.34))
     mle(f, start=list(lambda=10))

which I find much cleaner and easier to understand than environment
munging.  Once you are defining a likelihood constructor you can think
about things like making it a bit more efficient by calculating
sufficient statistics once, for example

     makePoisonNegLogLikelihood <- function(x) {
         sumX <- sum(x)
         n <- length(x)
 	function(lambda) -dpois(sumX, n * lambda, log=T)



> we can simplify it using proto.  Create a proto object to hold
> f and x then pass the f in the proto object (rather than the
> original f) to mle.  That works because proto automatically resets
> the environment of f when its added to avoiding the evalq.
>> set.seed(1)
>> library(proto)
>> f <- function(lambda) -sum(dpois(x, lambda, log=TRUE))
>> p <- proto(f = f, x = rpois(100, 12.34))
>> mle(p[["f"]], start = list(lambda = 10))
> Call:
> mle(minuslogl = p[["f"]], start = list(lambda = 10))
> Coefficients:
>  lambda
> 12.46000
>> It is not at all an unlikely design to have mle() as a generic function
>> which works on many kinds of objects, the default method being
>> function(object,...) mle(minuslogl(obj)) and minuslogl is an extractor
>> function returning (tada!) the negative log likelihood function.
>>>   (My version also has a cool formula interface and other
>>> bells and whistles, and I would love to get feedback from other
>>> useRs about it.)
>>>    cheers
>>>     Ben Bolker
>> --
>>   O__  ---- Peter Dalgaard             Øster Farimagsgade 5, Entr.B
>>  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
>>  (*) \(*) -- University of Copenhagen   Denmark          Ph:  (+45) 35327918
>> ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)                  FAX: (+45) 35327907
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

Luke Tierney
Chair, Statistics and Actuarial Science
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa                  Phone:             319-335-3386
Department of Statistics and        Fax:               319-335-3017
    Actuarial Science
241 Schaeffer Hall                  email:      luke at stat.uiowa.edu
Iowa City, IA 52242                 WWW:  http://www.stat.uiowa.edu

More information about the R-devel mailing list