[R] Simulating points from GLM corresponding to new x-values

Jacob Nabe-Nielsen jacobnabe at me.com
Fri Aug 14 05:46:58 CEST 2009


Hi Cliff,

Excellent, just what I needed -- wish I had thought of that myself.  
Thanks a lot!

Jacob


On 13 Aug 2009, at 01:49, Clifford Long wrote:

> Hi Jacob,
>
> At the risk of embarrassing myself, I gave it a shot.  I'll throw this
> out on the list, if for no other reason than perhaps someone with
> higher wattage than myself might tear it apart and give you something
> both useful and perhaps elegant (from an R coding standpoint).
>
> (see the following ... it just ran for me, so I hope it will for  
> you, too)
>
> If this isn't what you need, I'll shut up and watch and learn from  
> the others.
>
> Regards,
>
> Cliff
>
>
> I tried to put together something that might work as an example ...
> based on a simple linear regression model, but using the GLM routine.
>
> Once the model was created, I used 'predict' based on the model
> outcome and the original x values.
>
> I then used 'predict' based on the model outcome and new x values,
> along with a function for simulation of the distribution at the new x
> values.
>
> At the
>
>
> #----------------------------------------------
> # Create sample data set to use with GLM
> #   (assume first order linear model for now)
> #----------------------------------------------
> b0 = 10
> b1 = 0.3
> x = sort(rep(seq(1,11, by=2), 10))
>
> fn.y = function(x1){y1 = b0 + b1*x1 + rnorm(n=1, mean=0, sd=1)}
>
> y = sapply(x, fn.y)
>
>
> xydata = data.frame(cbind(x, y))
>
>
> model1 = glm(y ~ x, family = gaussian, data = xydata)
>
>
> #----------------------------------------------
> # Generate new x values
> #   run new x values through 'predict'
> #----------------------------------------------
>
> newx = data.frame(xnew = sort(rep(seq(2,10, by=2), 12)))
>
> y.pred = predict(model1, newx, se.fit = TRUE)
>
>
> #----------------------------------------------
> # Generate simulated values based on new x values
> #   and function based on outcome of 'predict' routine
> #----------------------------------------------
>
> fn.pred = function(fit, sefit){rnorm(n=1, mean=fit,  
> sd=sefit*sqrt(60))}
>
> pred.sim = sapply(y.pred$fit, fn.pred, y.pred$se.fit)
>
>
> #----------------------------------------------
> # Generate simulated values based on orig x values
> #   using 'simulate' routine
> #----------------------------------------------
>
> y.sim = simulate(model1, nsim = 1)
>
>
> #----------------------------------------------
> # Plot original x, y values
> #   then add simulated y values from 'simulate' based on orig x values
> #   and the add simulated values from 'predict' and function based on
> new x values
> #----------------------------------------------
> plot(x, y)
> lines(x, y.sim$sim_1, col='red', type='p')
> lines(newx[,1], pred.sim, col='darkblue', type='p')
>
> #----------------------------------------------
> # END
> #----------------------------------------------
>
>
>
> On Wed, Aug 12, 2009 at 2:51 PM, Jacob Nabe- 
> Nielsen<jacobnabe at me.com> wrote:
>> Hi Cliff -- thanks for the suggestion.
>>
>> I tried extracting the predicted mean and standard error using  
>> predict().
>> Afterwards I simulated the dependent variable using rnorm(), with  
>> mean and
>> standard deviation taken from the predict() function (sd =  
>> sqrt(n)*se). The
>> points obtained this way were scattered far too much (compared to  
>> points
>> obtained with simulate()) -- I am not quite sure why.
>>
>> Unfortunately the documentation of the simulate() function does not  
>> provide
>> much information about how it is implemented, which makes it  
>> difficult to
>> judge which method is best (predict() or simulate(), and it is also  
>> unclear
>> whether simulate() can be applied to glms (with family=gaussian or
>> binomial).
>>
>> Any suggestions for how to proceed?
>>
>> Jacob
>>
>>
>> On 12 Aug 2009, at 13:11, Clifford Long wrote:
>>
>>> Would the "predict" routine (using 'newdata') do what you need?
>>>
>>> Cliff Long
>>> Hollister Incorporated
>>>
>>>
>>>
>>> On Wed, Aug 12, 2009 at 4:33 AM, Jacob Nabe- 
>>> Nielsen<jacobnabe at me.com>
>>> wrote:
>>>>
>>>> Dear List,
>>>>
>>>> Does anyone know how to simulate data from a GLM object  
>>>> correponding
>>>> to values of the independent (x) variable that do not occur in the
>>>> original dataset?
>>>>
>>>> I have tried using simulate(), but it generates a new value of the
>>>> dependent variable corresponding to each of the original x-values,
>>>> which is not what I need. Ideally I whould like to simulate new  
>>>> values
>>>> for GLM objects both with family="gaussian" and with  
>>>> family="binomial".
>>>>
>>>> Thanks in advance,
>>>> Jacob
>>>>
>>>> Jacob Nabe-Nielsen, PhD, MSc
>>>> Scientist
>>>>  --------------------------------------------------
>>>> Section for Climate Effects and System Modelling
>>>> Department of Arctic Environment
>>>> National Enviornmental Research Institute
>>>> Aarhus University
>>>> Frederiksborgvej 399, Postbox 358
>>>> 4000 Roskilde, Denmark
>>>>
>>>> email: nabe at dmu.dk
>>>> fax: +45 4630 1914
>>>> phone: +45 4630 1944
>>>>
>>>>
>>>>       [[alternative HTML version deleted]]
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>
>>




More information about the R-help mailing list