[R] Sampling question

arun smartpink111 at yahoo.com
Tue Nov 5 21:59:31 CET 2013


Hi,

You may try:
dat1 <- structure(list(SubID = 1:8, CSE1 = c(6L, 6L, 5L, 5L, 5L, 5L, 
3L, 3L), CSE2 = c(5L, 4L, 5L, 4L, 6L, 4L, 6L, 6L), CSE3 = c(6L, 
7L, 5L, 3L, 7L, 3L, 6L, 6L), CSE4 = c(2L, 2L, 5L, 4L, 5L, 6L, 
3L, 3L), WSE1 = c(6L, 6L, 5L, 4L, 6L, 4L, 6L, 6L), WSE2 = c(2L, 
6L, 5L, 4L, 4L, 3L, 5L, 5L), WSE3 = c(2L, 2L, 4L, 5L, 4L, 7L, 
2L, 4L), WSE4 = c(4L, 3L, 5L, 2L, 1L, 3L, 1L, 7L)), .Names = c("SubID", 
"CSE1", "CSE2", "CSE3", "CSE4", "WSE1", "WSE2", "WSE3", "WSE4"
), class = "data.frame", row.names = c(NA, -8L))


fun1 <- function(dat, rep){
res <- replicate(rep,{
 lst1 <-lapply(sample(nrow(dat),nrow(dat)),function(x) sample(dat[x,2:5],4))
names(lst1) <- sapply(lst1,row.names)

lst1[-c(1:2)] <- lapply(names(lst1)[-c(1:2)],function(i) {
            x1 <- dat[i,6:9][is.na(match(gsub("^.","",names(dat[i,6:9])),gsub("^.","",names(lst1[[i]][1]))))]
             cbind(lst1[[i]][1], sample(x1,3))
            
                }
                )


 do.call(rbind,lapply(lst1,function(x) {datNew <- cbind(SubID= as.numeric(row.names(x)), x); names(datNew)[-1] <- "var"; datNew}))
})
res
}

 res1 <- fun1(dat1,5)
lst2 <- lapply(split(res1,col(res1)), function(x) {dat <- do.call(cbind,x); colnames(dat) <- c("SubID", rep("var",4));dat})

do.call(cbind,res1[,1])
do.call(cbind,res1[,2])
A.K.




I have a question about drawing samples from a data frame. This might 
sound really tricky. Let me use a data frame I have posted earlier as an example: 

    SubID    CSE1 CSE2 CSE3 CSE4 WSE1 WSE2 WSE3 WSE4 
      1          6      5       6       2      6      2        2       4 
      2          6      4       7       2      6      6        2       3 
      3          5      5       5       5      5      5        4       5 
      4          5      4       3       4      4      4        5       2 
      5          5      6       7       5      6      4        4       1 
      6          5      4       3       6      4      3        7       3 
      7          3      6       6       3      6      5        2       1 
      8          3      6       6       3      6      5        4       7 

this data frame have two sets of variables. each set simply 
represent one scale. as shown above, the first scale, say CSE, consists 
of four items: CSE1, CSE2, CSE3, and CSE4, whereas the second scale, say
 WSE, also has four items: WSE1, WSE2, WSE3, WSE4. 
the leftmost column lists the subjects' ID. 

I wanna create a new data frame through sampling random numbers 
from the data frame above. Below is the structure of the new data frame. 

    SubID    var    var   var     var 
      s          c      c      c       c       
      s          c      c      c       c       
      s          c      w     w       w       
      s          c      w     w       w           
      s          c      w     w       w         
      s          c      w     w       w         
      s          c      w     w       w         
      s          c      w     w       w 

in the new data frame: 
  
s= SubID range from 1 to 8 
var= variables 
c=CSE numbers 
w=WSE numbers 

some rules to construct the new data frame: 

1. the top two rows have to be filled with CSE numbers; the 
numbers in the cells of each row should be randomized. for example, if 
the first row is an array of numbers from subject 4, they can follow the
 order: 4(CSE2), 5(CSE1), 3(CSE3), and 4(CSE4). Also, the numbers in the
 second row does not have to follow the order of the first row. for 
example, similarly, if the first row is an array of numbers from subject
 4 in the order: 4(CSE2), 5(CSE1), 3(CSE3), and 4(CSE4), numbers in the 
second row (assuming it is from subject 8) does not have to be 6(CSE2), 
3(CSE1), 6(CSE3), and 3(CSE4). numbers in these two rows should be drawn
 without replacement. 

2. each of the rest of the rows should include a CSE number in 
the leftmost cell and three WSE numbers on the right. At the same time, 
in each row, the three WSE numbers on the right have to be only those 
numbers that are not corresponding to the CSE number in the leftmost 
cell. For example, if the CSE number in the leftmost cell is 4, a CSE2 
number from subject 6, the three WSE numbers on the right side can only 
be 4(WSE1), 7(WSE3), and 3(WSE4) from subject 6. 

3. the numbers in each row can only be drawn from the same 
subject. Also, Subjects should be randomized. Specifically, they does 
have to be in the following order: 

 SubID     
      1         
      2           
      3         
      4           
      5           
      6           
      7           
      8     
      
they can be: 

 SubID     
      2         
      8           
      5         
      4           
      1           
      6           
      7           
      3 
4. repeat the whole process 1000 times to draw 1000 random samples 

Any ideas?  Thanks in advance!! :)



More information about the R-help mailing list