[R] Not missing at random

Joshua Wiley jwiley.psych at gmail.com
Tue Jun 7 16:05:35 CEST 2011


Hi Blaz,

What do you do if the number of values sampled to be set missing
(e.g., 4) is greater than the number of values for a given case that
are less than your < 3 threshold?  If no special considerations are
needed for that, I do not see why you cannot apply the same technique
you did below with MCAR to MNAR.

Best regards,

Josh

On Tue, Jun 7, 2011 at 12:17 AM, Blaz Simcic <blazsimcic at yahoo.com> wrote:
> Josh,
>
> thanks for the answer, it really helped me. I have another question, if you
> maybe know how to do it.
>
> I would also like  to sample number of missing values within selected cases,
> as i did wit MCAR (see below).
>
> Can you help me tith this?
>
> Thanks,
>
> Blaz from Slovenia
>
> Here is my code for MCAR:
>
> N <- 1000      ####number of cases
>
> n <- 12           ####number of variables
>
> X <- matrix(rnorm(N * n), N, n)    ####matrix
>
> pMiss <- 0.20     ####percent of missing values
>
> idMiss <- sample(1:N, N * pMiss)    ####sample cases
>
> nMiss <- length(idMiss)
>
> m <- 3    ####maximum number of missing values within selected cases
>
> howmanyMiss <- sapply(idMiss, function(x) sample(1:m, 1))
>
> howmanyMiss  #### number of missing values within selected cases
>
> varMiss<-lapply(howmanyMiss, function(x) sample(1:n, x))    #### which
> values are missing
>
> ids <- cbind(rep(idMiss, howmanyMiss), unlist(varMiss))
>
> Xmiss <- X
>
> Xmiss[ids] <- NA
>
> Xmiss
>
> ________________________________
> From: Joshua Wiley <jwiley.psych at gmail.com>
> To: Blaz Simcic <blazsimcic at yahoo.com>
> Cc: r-help at r-project.org
> Sent: Mon, June 6, 2011 10:34:38 PM
> Subject: Re: [R] Not missing at random
>
> Hi Blaz,
>
> See below.
>
> x <-
> matrix(c(1,2,3,4,5,1,2,3,4,5,1,2,3,4,5,1,2,3,4,5,1,2,3,4,5,1,2,3,4,5,1,2,3,4,5,1,2,3,4,5,1,2,3,4,5,3,3,3,4),
> nrow = 7, ncol=7, byrow=TRUE) ####matrix
>
> pMiss <- 30    ####percent of missing values
>
> N <- dim(x)[1]  ####number of cases
>
> candidate <- which(x[,1]<3 | x[,2]<3 | x[,3]<3 | x[,4]<3 | x[,5]<3 | x[,6]<3
> |
> x[,7]<3)    #### I want to sample all cases with at least 1 value
> lower than 3, so I have to find candidates
>
> ## easier to use this
> ## find all x < 3 and return their row and column indices
> ## select only row indices, and then find unique
> candidate <- unique(which(x < 3, arr.ind = TRUE)[, "row"])
>
> idMiss <- sample(candidate, N * pMiss / 100)  #### I sampled cases
>
> ## from the subset of x cases that will be missing
> ## find all that are < 3 and set to NA
> x[idMiss, ][x[idMiss, ] < 3] <- NA
>
> ## If you are going to do this a lot, consider a function
> nmar <- function(x, op = "<", value = 3, p = 30) {
>   op <- get(op)
>   candidate <- unique(which(op(x, value), arr.ind = TRUE)[, "row"])
>   idMiss <- sample(candidate, nrow(x) * p / 100)
>   x[idMiss, ][op(x[idMiss, ], value)] <- NA
>   return(x)
> }
>
> nmar(x)
>
> ## has the advantage that you can easily change
> ## p, the cut off value, the operator (e.g., "<", ">", "<=", etc.)
>
> Cheers,
>
> Josh
>
> On Sun, Jun 5, 2011 at 11:17 PM, Blaz Simcic <blazsimcic at yahoo.com> wrote:
>>
>>
>> Hello!
>>
>> I would like to sample 30 % of cases (with at least 1 value lower than 3 -
>> in
>> the row) and among them I want to set all values lower than 3 (within
>> selected
>> cases) as NA (NMAR- Not missing at random). I managed to sample cases, but
>> I
>> don’t know how to set values (lower than 3) as NA.
>>
>> R code:
>>
>> x <-
>>
>> matrix(c(1,2,3,4,5,1,2,3,4,5,1,2,3,4,5,1,2,3,4,5,1,2,3,4,5,1,2,3,4,5,1,2,3,4,5,1,2,3,4,5,1,2,3,4,5,3,3,3,4),
>>  nrow = 7, ncol=7, byrow=TRUE) ####matrix
>>
>> pMiss <- 30     ####percent of missing values
>>
>> N <- dim(x)[1]   ####number of cases
>>
>> candidate<-which(x[,1]<3 | x[,2]<3 | x[,3]<3 | x[,4]<3 | x[,5]<3 | x[,6]<3
>> |
>> x[,7]<3)    #### I want to sample all cases with at least 1 value lower
>> than 3,
>> so I have to find candidates
>>
>> idMiss <- sample(candidate, N * p / 100)    #### I sampled cases
>>
>> Now I'd like to set all values among sampled cases as NA.
>>
>> Any suggestion?
>>
>> Thanks,
>> Blaž
>>        [[alternative HTML version deleted]]
>>
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>
>
>
> --
> Joshua Wiley
> Ph.D. Student, Health Psychology
> University of California, Los Angeles
> http://www.joshuawiley.com/
>



-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/



More information about the R-help mailing list