[R] Simulating points from GLM corresponding to new x-values

Clifford Long gnolffilc at gmail.com
Thu Aug 13 01:49:18 CEST 2009


Hi Jacob,

At the risk of embarrassing myself, I gave it a shot.  I'll throw this
out on the list, if for no other reason than perhaps someone with
higher wattage than myself might tear it apart and give you something
both useful and perhaps elegant (from an R coding standpoint).

(see the following ... it just ran for me, so I hope it will for you, too)

If this isn't what you need, I'll shut up and watch and learn from the others.

Regards,

Cliff


I tried to put together something that might work as an example ...
based on a simple linear regression model, but using the GLM routine.

Once the model was created, I used 'predict' based on the model
outcome and the original x values.

I then used 'predict' based on the model outcome and new x values,
along with a function for simulation of the distribution at the new x
values.

At the


#----------------------------------------------
# Create sample data set to use with GLM
#   (assume first order linear model for now)
#----------------------------------------------
b0 = 10
b1 = 0.3
x = sort(rep(seq(1,11, by=2), 10))

fn.y = function(x1){y1 = b0 + b1*x1 + rnorm(n=1, mean=0, sd=1)}

y = sapply(x, fn.y)


xydata = data.frame(cbind(x, y))


model1 = glm(y ~ x, family = gaussian, data = xydata)


#----------------------------------------------
# Generate new x values
#   run new x values through 'predict'
#----------------------------------------------

newx = data.frame(xnew = sort(rep(seq(2,10, by=2), 12)))

y.pred = predict(model1, newx, se.fit = TRUE)


#----------------------------------------------
# Generate simulated values based on new x values
#   and function based on outcome of 'predict' routine
#----------------------------------------------

fn.pred = function(fit, sefit){rnorm(n=1, mean=fit, sd=sefit*sqrt(60))}

pred.sim = sapply(y.pred$fit, fn.pred, y.pred$se.fit)


#----------------------------------------------
# Generate simulated values based on orig x values
#   using 'simulate' routine
#----------------------------------------------

y.sim = simulate(model1, nsim = 1)


#----------------------------------------------
# Plot original x, y values
#   then add simulated y values from 'simulate' based on orig x values
#   and the add simulated values from 'predict' and function based on
new x values
#----------------------------------------------
plot(x, y)
lines(x, y.sim$sim_1, col='red', type='p')
lines(newx[,1], pred.sim, col='darkblue', type='p')

#----------------------------------------------
# END
#----------------------------------------------



On Wed, Aug 12, 2009 at 2:51 PM, Jacob Nabe-Nielsen<jacobnabe at me.com> wrote:
> Hi Cliff -- thanks for the suggestion.
>
> I tried extracting the predicted mean and standard error using predict().
> Afterwards I simulated the dependent variable using rnorm(), with mean and
> standard deviation taken from the predict() function (sd = sqrt(n)*se). The
> points obtained this way were scattered far too much (compared to points
> obtained with simulate()) -- I am not quite sure why.
>
> Unfortunately the documentation of the simulate() function does not provide
> much information about how it is implemented, which makes it difficult to
> judge which method is best (predict() or simulate(), and it is also unclear
> whether simulate() can be applied to glms (with family=gaussian or
> binomial).
>
> Any suggestions for how to proceed?
>
> Jacob
>
>
> On 12 Aug 2009, at 13:11, Clifford Long wrote:
>
>> Would the "predict" routine (using 'newdata') do what you need?
>>
>> Cliff Long
>> Hollister Incorporated
>>
>>
>>
>> On Wed, Aug 12, 2009 at 4:33 AM, Jacob Nabe-Nielsen<jacobnabe at me.com>
>> wrote:
>>>
>>> Dear List,
>>>
>>> Does anyone know how to simulate data from a GLM object correponding
>>> to values of the independent (x) variable that do not occur in the
>>> original dataset?
>>>
>>> I have tried using simulate(), but it generates a new value of the
>>> dependent variable corresponding to each of the original x-values,
>>> which is not what I need. Ideally I whould like to simulate new values
>>> for GLM objects both with family="gaussian" and with family="binomial".
>>>
>>> Thanks in advance,
>>> Jacob
>>>
>>> Jacob Nabe-Nielsen, PhD, MSc
>>> Scientist
>>>  --------------------------------------------------
>>> Section for Climate Effects and System Modelling
>>> Department of Arctic Environment
>>> National Enviornmental Research Institute
>>> Aarhus University
>>> Frederiksborgvej 399, Postbox 358
>>> 4000 Roskilde, Denmark
>>>
>>> email: nabe at dmu.dk
>>> fax: +45 4630 1914
>>> phone: +45 4630 1944
>>>
>>>
>>>       [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>
>




More information about the R-help mailing list