[R] Randomly drop a percent of data from a data.frame

arun smartpink111 at yahoo.com
Fri Aug 16 23:34:55 CEST 2013


Hi,
May be this helps:
#data1 (changed `data` to `data1`)
set.seed(6245)
 data1 <- data.frame(x1=rnorm(5),x2=rnorm(5),x3=rnorm(5),x4=rnorm(5))
 data1<- round(data1,digits=3)

data2<- data1

data1[,3:4]<-lapply(data1[,3:4],function(x){x1<- match(x,sample(unlist(data1[,3:4]),round(0.8*length(unlist(data1[,3:4])))));x[is.na(x1)]<-NA;x})
 data1
#      x1     x2     x3     x4
#1  0.482  1.320     NA -0.142
#2 -0.753 -0.041 -0.063  0.886
#3  0.028 -0.256 -0.069  0.354
#4 -0.086  0.475  0.244  0.781
#5  0.690 -0.181  1.274  1.633


#or
data2[,3:4]<-lapply(data2[,3:4],function(x){x1<- match(x,sample(unlist(data2[,3:4]),round(0.8*length(unlist(data2[,3:4])))));x[is.na(x1)]<-NA;x})
 data2
#      x1     x2     x3     x4
#1  0.482  1.320 -0.859 -0.142
#2 -0.753 -0.041     NA     NA
#3  0.028 -0.256 -0.069  0.354
#4 -0.086  0.475  0.244  0.781
#5  0.690 -0.181  1.274  1.633
A.K.



----- Original Message -----
From: Christopher Desjardins <cddesjardins at gmail.com>
To: "r-help at r-project.org" <r-help at r-project.org>
Cc: 
Sent: Friday, August 16, 2013 3:02 PM
Subject: [R] Randomly drop a percent of data from a data.frame

Hi,
I have the following data.

> set.seed(6245)
> data <- data.frame(x1=rnorm(5),x2=rnorm(5),x3=rnorm(5),x4=rnorm(5))
> round(data,digits=3)
      x1     x2     x3     x4
1  0.482  1.320 -0.859 -0.142
2 -0.753 -0.041 -0.063  0.886
3  0.028 -0.256 -0.069  0.354
4 -0.086  0.475  0.244  0.781
5  0.690 -0.181  1.274  1.633

What I would like to do is drop 20% of the data. But I want this 20% to
only come from dropping data from x3 and x4. It doesn't have to be evenly,
i.e. I don't care to drop 2 from x3 and 2 from x4 or make sure only one
observation has missing data on only one variable. I just want to drop 20%
of the data through x3 and x4 only.  In other words,

       x1     x2     x3     x4
1  0.482  1.320 -0.859 NA
2 -0.753 -0.041 -0.063  0.886
3  0.028 -0.256      NA  0.354
4 -0.086  0.475      NA  0.781
5  0.690 -0.181      NA  1.633

OR

      x1     x2     x3     x4
1  0.482  1.320     NA -0.142
2 -0.753 -0.041 -0.063  0.886
3  0.028 -0.256      NA  NA
4 -0.086  0.475  0.244  NA
5  0.690 -0.181  1.274  1.633

OR

      x1     x2     x3     x4
1  0.482  1.320 -0.859 -0.142
2 -0.753 -0.041 -0.063     NA
3  0.028 -0.256 -0.069     NA
4 -0.086  0.475  0.244     NA
5  0.690 -0.181  1.274     NA

ETC. are all fine.

Any ideas how I can do this?
Chris

    [[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list