[R] R question: generating data using MASS

Ben Bolker bbolker at gmail.com
Mon Aug 29 08:29:23 CEST 2011


uf_mike <michael.parent <at> ufl.edu> writes:

> 
> Hi, all! I'm new to R but need to use it to solve a little problem I'm having
> with a paper I'm writing. The question has a few components and I'd
> appreciate guidance on any of them.
> 
> 1. The most essential thing is that I need to generate some multivariate
> normal data on a restricted integer range (1 to 7). I know I can use MASS
> mvrnorm command to do this but have a couple questions about that:
> -I can make the simulated data but I don't know how to issue a command that
> restricts the generated data to be between a specific range (1 to 7), and
> integer-only.

   This problem isn't uniquely defined.  Are you willing to generate
more samples than you need and then throw away extreme values?  Or do
you want to 'censor' extreme values (i.e. set values <= 1 to 1 and
values >=7 to 7)?

  x <- MASS::mvrnorm(10000,...)
  x2 <- x[x>=1 & x<=7]
  x3 <- x2[1:1000]  ## or however many you need
  x4 <- round(x3)


> -Is there a way to specify a single desired correlation between all the
> variables (i.e., I want, say, five variables to all be correlated about .30
> with each other), rather than input the entire covariance matrix as sigma?

   What's wrong with

m <- matrix(0.3,nrow=5,ncol=5)
diag(m) <- 1
m <- m*variance

  ?
> 
> 2. I need to introduce missing data (NA) AFTER generating the data set, and
> I need it to be random and at a specific prevalence (say, 5%). Is there a
> simple way to take the initial data set and randomly replace 5% of values
> with NA missing values?

  x4[sample(seq(x4),size=0.05*length(x4),replace=FALSE)] <- NA
>



More information about the R-help mailing list