[R] Creating missing values.
feldesmanm at pdx.edu
Sun Mar 24 18:12:31 CET 2002
I'm trying to figure out whether there is a simple one or two-pass approach
to randomly creating missing values for a set of existing (complete)
data. For example, I want to randomly make 10% of the entries in the Iris
dataset missing (i.e. NA). I don't want any case to have all missing
values and I don't want any case to be missing the classification
variable. I can do this in about 3 passes, but I haven't figured out
whether there is an efficient way to do this in one or two passes through
My approach involves creating a dummy vector with a length equal to the
of the Iris data (750 elements). >sample(750, 1:10, replace=T). I then
assigned all values of 2 to be 0 and all others to be 1. This left me with
approximately 10% of the entries as "missing". I reshaped this into a 150
x 5 matrix. From here, things were pretty straightforward.
Is there anyway to bypass the dummy vector and operate directly on a copy
of the original Iris matrix and get to the point above without the
Dr. Marc R. Feldesman
Professor and Chairman
Portland State University
1721 SW Broadway
Portland, Oregon 97201
email: feldesmanm at pdx.edu
PGP Key Available On Request
"Beyond every credibility gap lies a gullibility fill"
Powered by Latochoerus and Windows 2000, SP1
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch
More information about the R-help