[R] selcting a random sample and saving it in a seprate dataframe and also remaining part in other data frame

arun smartpink111 at yahoo.com
Thu Nov 22 16:57:15 CET 2012


HI Madhu,

I guess you got your solution from Rui:
 dat1<-data.frame(x=c(1,1,2,2,2,3,4,4,4),y=c(23,45,87,46,78,12,87,79,76))
s<-sample(unique(dat1[,1]),length(unique(dat1[,1]))*0.8)
 s
#[1] 3 4 2
You can have a list containing both the dataframes
list1<-list(dat1[dat1$x%in%s,],dat1[!dat1$x%in%s,])
list1
[[1]]
#  x  y
#3 2 87
#4 2 46
#5 2 78
#6 3 12
#7 4 87
#8 4 79
#9 4 76

#[[2]]
 # x  y
#1 1 23
#2 1 45

is.data.frame(list1[[1]])
#[1] TRUE

A.K.



----- Original Message -----
From: Madhu Ganganapalli <mganganapalli at upstreamsoftware.com>
To: arun <smartpink111 at yahoo.com>
Cc: 
Sent: Thursday, November 22, 2012 2:53 AM
Subject: selcting a random sample and saving it in a seprate dataframe and also remaining part in other data frame

**>
My question is:

I have the following data frame and my distinct values of variable x are 1,2,3,4.

    data<-data.frame(x=c(1,1,2,2,2,3,4,4,4),y=c(23,45,87,46,78,12,87,79));**

Here my data has 8 observations but I mentioned that distinct observations are 4 so 80% data means I have to get a random sample from these 4 observations only,  in such a way that 
Suppose while selecting 80% random sample from x I got 1,2, and 3(80% means 80/100*4=3 roughly) so I want a following out put in separate data frame.
X   y
1  23
1  45
2  87
2  46
2  78
3  12

That means if 1 is in 80% of  my random sample then  the data corresponding  to remaining 1's also should be there in my data frame.

One more thing is after creating this data frame, we have only one distinct observations which is 4 in our actual data frame

What I mean is we have to get two data sets simultaneously in two different data frames, which is of above output format. 

In this case second data frame is 

X   y
4  12
4  87
4  79

This will help while building a model, because we use only 80% data for modeling and remaining 20% for validation so that is way I want two datasets simultaneously in two different data frames. 

Please help me.........

Thanks,
Madhu.



More information about the R-help mailing list