[R] Data Simulation in R

Uwe Ligges ligges at statistik.uni-dortmund.de
Wed Jan 19 13:49:26 CET 2005


Doran, Harold wrote:

> Thanks. But, I think I am doing that. I use rm() and gc() as the code
> moves along. The datasets are stored as a list. Is there a way that I
> can save the list object and call each dataset within a list one at a
> time, or must the entire list be in memory at once?

The list is in memory - and must be to access its elements.
Either save the list elements to separate files, or even better make use 
of a database.

Uwe Ligges




> Harold
> 
> -----Original Message-----
> From: Uwe Ligges [mailto:ligges at statistik.uni-dortmund.de] 
> Sent: Wednesday, January 19, 2005 5:51 AM
> To: Doran, Harold
> Cc: r-help at stat.math.ethz.ch
> Subject: Re: [R] Data Simulation in R
> 
> Doran, Harold wrote:
> 
> 
>>Dear List:
>>
>>A few weeks ago I posted some questions regarding data simulation and 
>>received some very helpful comments, thank you. I have modified my 
>>code accordingly and have made some progress.
>>
>>However, I now am facing a new challenge along similar lines. I am 
>>attempting to simulate 250 datasets and then run the data through a 
>>linear model. I use rm() and gc() as I move along to clean up the 
>>workspace and preserve memory. However, my aim is to use sample sizes 
>>of 5,000 and 10,000. By any measure this is a huge task.
>>
>>My machine has 2GB RAM and a Pentium 4 2.8 GHz machine with Windows
> 
> XP.
> 
>>I have the following in the "target" section of the Windows shortcut 
>>--max-mem-size=1812M
>>
>>With such large samples, R is unable to perform the analysis, at least
> 
> 
>>with the code I have developed. It halts when it runs out of memory. A
> 
> 
>>collegue subsequently constructed the simulation using another 
>>software program with a similar computer and, while it took over night
> 
> 
>>(and then some), the program produced the results desired.
>>
>>I am curious if it is the case that such large simulations are out of 
>>the grasp of R or if my code is not adequately organized to perform 
>>the simulation.
>>
>>I would appreciate any thoughts or advice.
> 
> 
> 
> Don't hold all datasets (and results, if they are big) in the memory at
> the same time!!!
> 
> Either generate them when you use them and delete them afterwards, or
> save them to disc an only load one by one for further analyses.
> Also, you might want to call gc() after you removed large objects...
> 
> Uwe Ligges
> 
> 
> 
> 
>>Harold
>>
>>
>>
>>library(MASS)
>>library(nlme)
>>mu<-c(100,150,200,250)
>>Sigma<-matrix(c(400,80,80,80,80,400,80,80,80,80,400,80,80,80,80,400),4
>>,4
>>)
>>mu2<-c(0,0,0)
>>Sigma2<-diag(64,3)
>>sample.size<-5000
>>N<-250 #Number of datasets
>>#Take a single draw from VL distribution vl.error<-mvrnorm(n=N, mu2, 
>>Sigma2)
>>
>>#Step 1 Create Data
>>Data <- lapply(seq(N), function(x)
>>as.data.frame(cbind(1:10,mvrnorm(n=sample.size, mu, Sigma))))
>>
>>#Step 2 Add Vertical Linking Error
>>for(i in seq(along=Data)){
>>Data[[i]]$V6 <- Data[[i]]$V2
>>Data[[i]]$V7 <- Data[[i]]$V3 + vl.error[i,1]
>>Data[[i]]$V8 <- Data[[i]]$V4 + vl.error[i,2]
>>Data[[i]]$V9 <- Data[[i]]$V5 + vl.error[i,3] }
>>
>>#Step 3 Restructure for Longitudinal Analysis long <- lapply(Data, 
>>function(x) reshape(x, idvar="Data[[i]]$V1", 
>>varying=list(c(names(Data[[i]])[2:5]),c(names(Data[[i]])[6:9])),
>>v.names=c("score.1","score.2"), direction="long"))
>>
>>#####################
>>#Clean up Workspace
>>rm(Data,vl.error)
>>gc()
>>#####################
>>
>># Step 4 Run GLS
>>
>>glsrun1 <- lapply(long, function(x) gls(score.1~I(time-1), data=x, 
>>correlation=corAR1(form=~1|V1), method='ML'))
>>
>># Extract intercepts and slopes
>>int1 <- sapply(glsrun1, function(x) x$coefficient[1])
>>slo1 <- sapply(glsrun1, function(x) x$coefficient[2])
>>
>>################
>>#Clean up workspace
>>rm(glsrun1)
>>gc()
>>
>>glsrun2 <- lapply(long, function(x) gls(score.2~I(time-1), data=x, 
>>correlation=corAR1(form=~1|V1), method='ML'))
>>
>># Extract intercepts and slopes
>>int2 <- sapply(glsrun2, function(x) x$coefficient[1])
>>slo2 <- sapply(glsrun2, function(x) x$coefficient[2])
>>
>> 
>>#Clean up workspace
>>rm(glsrun2)
>>gc()
>>
>>
>>
>># Print Results
>>
>>cat("Original Standard Errors","\n", "Intercept","\t", 
>>sd(int1),"\n","Slope","\t","\t", sd(slo1),"\n")
>>
>>cat("Modified Standard Errors","\n", "Intercept","\t", 
>>sd(int2),"\n","Slope","\t","\t", sd(slo2),"\n")
>>
>>rm(list=ls())
>>gc()
>>
>>	[[alternative HTML version deleted]]
>>
>>______________________________________________
>>R-help at stat.math.ethz.ch mailing list
>>https://stat.ethz.ch/mailman/listinfo/r-help
>>PLEASE do read the posting guide! 
>>http://www.R-project.org/posting-guide.html
> 
>




More information about the R-help mailing list