# [R] Generating Data using Formulas

Charles C. Berry cberry at tajo.ucsd.edu
Thu May 31 05:58:08 CEST 2007

```Christian,

The formula language is not suited to such recursive useage
AFAICS.

You can _vectorize_ your code like this:

cmat <- outer( 1:25, 1:25, function(y,x) ifelse( x>y, 0, 0.8^(y-x) ) )
res <- replicate(1000,{
y <- 1 + cmat %*% rnorm(25)
coef(lm(y[-1]~y[-25]))
})
rowMeans(res) # mean of 1000 replicates

HTH,

Chuck

On Tue, 29 May 2007, Chrisitan Falde wrote:

> Hello,
>
> My name is Christian Falde.  I am new to R.
>
> My problem is this.  I am attempting to learn R on my own. In so doing I
> am using some problems from Davidson and MacKinnon Econometric Theory
> and Methods to do so.  This is because I can already do the some of the
> problems in SAS so I am attempting to rework them using R. Seemed
> logical to me, now I am stuck and its really bugging me.
>
>
> The problem is this
>
> Generate a data set sample size of 25 with the formula y=1+.8*y(t-1)+ u.
> Where y is the dependent, y(t-1) is the dependent variable lagged one
> peroid, and u is the classical error term.  Assume y0=0 and the u is
> NID(0,1). Use this sample to compute the OLS estimates B1 (1) and
> B2(.8).  Repeat at least 100 times and find the average of the B's.
> Use these average to estimate the bias of the ols estimators.
>
> To start I did the following non lagged program.
>
> final<-function(i,j){x<-function(i) {10*i}
> y<-function(i,j) {1+.8*10*i+100*rnorm(j)}
> datathreeone<- data.frame(replicate(100,coef(lm(y(i,j)~x(i)))))
> rowMeans(datathreeone)}
> final(1:25,25)
> final(1:50,50)
> final(1:100,100)
> final(1:200,200)
> final(1:10000,10000)
>
>
> Now the "only" thing I need to to is change ".8*10*i"  which is
> exogenous to ".8* y(t-1) ".
>
> There are two reasons why I did it this way. I needed the rnorm(i) to
> generate a new set of u's each replication, and I wanted to be able to
> use the function as i did to make the results more concise.
>
> For the lag in SAS we used an if then else logic relating to the number
> of observation.  This in R would have to be linked to the invisable row
> number.  I think I need an index variable for the row.  Perhaps, sorry
> thinking while typing.
>
> Another reason why I am stuck, the lag function was seemingly straight forward.
>
> lag (x, k=1)
>
> yet x has to be a matrix so when I tried to do it like above with y as a
> function R complained.
>
> I have been working on this for a couple of days now so everything is
> begining to not make sense.  It just seems to me to get the matrix to
> work out I would need to have two matrices.
>
> dependent        and           explanatory
> y1                 =     sum (  1 +.8*0 + 100*rnorm(i))
> y2                 =     sum ( 1 +.8* (dependent row 1) + 100*rnorm(i))
> etc
>
> I just am not sure how to do that.
>
>
> christian falde
>
[snip]

Charles C. Berry                        (858) 534-2098
Dept of Family/Preventive Medicine
E mailto:cberry at tajo.ucsd.edu	         UC San Diego
http://biostat.ucsd.edu/~cberry/         La Jolla, San Diego 92093-0901

```