[R] randomly select duplicated entries

jim holtman jholtman at gmail.com
Wed Jul 9 22:42:42 CEST 2008


How about this:

> dat <- read.table(textConnection("Id         myvar
+ 12 1
+ 12 2
+ 12 6
+ 34 9
+ 34 4
+ 34 8
+ 65 15
+ 65 23"), header = TRUE)
> closeAllConnections()
> # split by the id and then choose one
> x <- lapply(split(dat, dat$Id), function(.grp){
+     .grp[sample(seq(length(.grp)), 1),]
+ })
> do.call(rbind, x)
   Id myvar
12 12     1
34 34     9
65 65    15


On Wed, Jul 9, 2008 at 3:17 PM, Juliet Hannah <juliet.hannah at gmail.com> wrote:
> Using this data as an example
>
> dat <- read.table(textConnection("Id         myvar
> 12 1
> 12 2
> 12 6
> 34 9
> 34 4
> 34 8
> 65 15
> 65 23"), header = TRUE)
> closeAllConnections()
>
> how can I create another data set that does not have duplicate entries
> for 'Id', but the included values
> are randomly selected from the available ones.
>
> Thanks!
>
> Juliet
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?



More information about the R-help mailing list