[R] Data Simulation in R

Wed Jan 19 16:07:11 CET 2005

Uwe Ligges wrote:

> Doran, Harold wrote:
>
>> Thanks. But, I think I am doing that. I use rm() and gc() as the code
>> moves along. The datasets are stored as a list. Is there a way that I
>> can save the list object and call each dataset within a list one at a
>> time, or must the entire list be in memory at once?
>
>
> The list is in memory - and must be to access its elements.
> Either save the list elements to separate files, or even better make 
> use of a database.
>
> Uwe Ligges

Or, when the dat is simulated, why can't you just (re)-simulate the 
dataset just before using
it, then delete, but saving the random seed, so you can re-simulate if 
needed?

Kjetil

>
>
>
>
>> Harold
>>
>> -----Original Message-----
>> From: Uwe Ligges [mailto:ligges at statistik.uni-dortmund.de] Sent: 
>> Wednesday, January 19, 2005 5:51 AM
>> To: Doran, Harold
>> Cc: r-help at stat.math.ethz.ch
>> Subject: Re: [R] Data Simulation in R
>>
>> Doran, Harold wrote:
>>
>>
>>> Dear List:
>>>
>>> A few weeks ago I posted some questions regarding data simulation 
>>> and received some very helpful comments, thank you. I have modified 
>>> my code accordingly and have made some progress.
>>>
>>> However, I now am facing a new challenge along similar lines. I am 
>>> attempting to simulate 250 datasets and then run the data through a 
>>> linear model. I use rm() and gc() as I move along to clean up the 
>>> workspace and preserve memory. However, my aim is to use sample 
>>> sizes of 5,000 and 10,000. By any measure this is a huge task.
>>>
>>> My machine has 2GB RAM and a Pentium 4 2.8 GHz machine with Windows
>>
>>
>> XP.
>>
>>> I have the following in the "target" section of the Windows shortcut 
>>> --max-mem-size=1812M
>>>
>>> With such large samples, R is unable to perform the analysis, at least
>>
>>
>>
>>> with the code I have developed. It halts when it runs out of memory. A
>>
>>
>>
>>> collegue subsequently constructed the simulation using another 
>>> software program with a similar computer and, while it took over night
>>
>>
>>
>>> (and then some), the program produced the results desired.
>>>
>>> I am curious if it is the case that such large simulations are out 
>>> of the grasp of R or if my code is not adequately organized to 
>>> perform the simulation.
>>>
>>> I would appreciate any thoughts or advice.
>>
>>
>>
>>
>> Don't hold all datasets (and results, if they are big) in the memory at
>> the same time!!!
>>
>> Either generate them when you use them and delete them afterwards, or
>> save them to disc an only load one by one for further analyses.
>> Also, you might want to call gc() after you removed large objects...
>>
>> Uwe Ligges
>>
>>
>>
>>
>>> Harold
>>>
>>>
>>>
>>> library(MASS)
>>> library(nlme)
>>> mu<-c(100,150,200,250)
>>> Sigma<-matrix(c(400,80,80,80,80,400,80,80,80,80,400,80,80,80,80,400),4
>>> ,4
>>> )
>>> mu2<-c(0,0,0)
>>> Sigma2<-diag(64,3)
>>> sample.size<-5000
>>> N<-250 #Number of datasets
>>> #Take a single draw from VL distribution vl.error<-mvrnorm(n=N, mu2, 
>>> Sigma2)
>>>
>>> #Step 1 Create Data
>>> Data <- lapply(seq(N), function(x)
>>> as.data.frame(cbind(1:10,mvrnorm(n=sample.size, mu, Sigma))))
>>>
>>> #Step 2 Add Vertical Linking Error
>>> for(i in seq(along=Data)){
>>> Data[[i]]$V6 <- Data[[i]]$V2
>>> Data[[i]]$V7 <- Data[[i]]$V3 + vl.error[i,1]
>>> Data[[i]]$V8 <- Data[[i]]$V4 + vl.error[i,2]
>>> Data[[i]]$V9 <- Data[[i]]$V5 + vl.error[i,3] }
>>>
>>> #Step 3 Restructure for Longitudinal Analysis long <- lapply(Data, 
>>> function(x) reshape(x, idvar="Data[[i]]$V1", 
>>> varying=list(c(names(Data[[i]])[2:5]),c(names(Data[[i]])[6:9])),
>>> v.names=c("score.1","score.2"), direction="long"))
>>>
>>> #####################
>>> #Clean up Workspace
>>> rm(Data,vl.error)
>>> gc()
>>> #####################
>>>
>>> # Step 4 Run GLS
>>>
>>> glsrun1 <- lapply(long, function(x) gls(score.1~I(time-1), data=x, 
>>> correlation=corAR1(form=~1|V1), method='ML'))
>>>
>>> # Extract intercepts and slopes
>>> int1 <- sapply(glsrun1, function(x) x$coefficient[1])
>>> slo1 <- sapply(glsrun1, function(x) x$coefficient[2])
>>>
>>> ################
>>> #Clean up workspace
>>> rm(glsrun1)
>>> gc()
>>>
>>> glsrun2 <- lapply(long, function(x) gls(score.2~I(time-1), data=x, 
>>> correlation=corAR1(form=~1|V1), method='ML'))
>>>
>>> # Extract intercepts and slopes
>>> int2 <- sapply(glsrun2, function(x) x$coefficient[1])
>>> slo2 <- sapply(glsrun2, function(x) x$coefficient[2])
>>>
>>>
>>> #Clean up workspace
>>> rm(glsrun2)
>>> gc()
>>>
>>>
>>>
>>> # Print Results
>>>
>>> cat("Original Standard Errors","\n", "Intercept","\t", 
>>> sd(int1),"\n","Slope","\t","\t", sd(slo1),"\n")
>>>
>>> cat("Modified Standard Errors","\n", "Intercept","\t", 
>>> sd(int2),"\n","Slope","\t","\t", sd(slo2),"\n")
>>>
>>> rm(list=ls())
>>> gc()
>>>
>>>     [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> R-help at stat.math.ethz.ch mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide! 
>>> http://www.R-project.org/posting-guide.html
>>
>>
>>
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
>
>
>

-- 

Kjetil Halvorsen.

Peace is the most effective weapon of mass construction.
               --  Mahdi Elmandjra

-- 
No virus found in this outgoing message.
Checked by AVG Anti-Virus.