[R] randomly sample within clustered data?

Brown, Tony Nicholas tony.n.brown at Vanderbilt.Edu
Mon Sep 15 10:23:24 CEST 2008


Thierry,

Thanks so much. Your solution works perfectly.

Tony

-----Original Message-----
From: ONKELINX, Thierry [mailto:Thierry.ONKELINX at inbo.be] 
Sent: Monday, September 15, 2008 2:56 AM
To: Brown, Tony Nicholas; r-help at r-project.org
Subject: RE: [R] randomly sample within clustered data?

Something like this?

do.call("rbind", 
	lapply(
		split(Dataf, Dataf$id), 
		function(x){
			x[sample(seq_len(nrow(x)), size=2), ]
		}
	)
)
 
HTH,

Thierry

------------------------------------------------------------------------
----
ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature
and Forest
Cel biometrie, methodologie en kwaliteitszorg / Section biometrics,
methodology and quality assurance
Gaverstraat 4
9500 Geraardsbergen
Belgium 
tel. + 32 54/436 185
Thierry.Onkelinx at inbo.be 
www.inbo.be 

To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to
say what the experiment died of.
~ Sir Ronald Aylmer Fisher

The plural of anecdote is not data.
~ Roger Brinner

The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of
data.
~ John Tukey

-----Oorspronkelijk bericht-----
Van: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
Namens Brown, Tony Nicholas
Verzonden: maandag 15 september 2008 9:40
Aan: r-help at r-project.org
Onderwerp: [R] randomly sample within clustered data?

Dear useRs,



What is an efficient way to randomly sample from clustered data such
that I get equal representation from each cluster? For example, let's
say I want to randomly sample two cases from each cluster created by the
"id" variable in the following data frame:



> id<-c(rep("100", 4),rep("101", 3), rep("102", 6), rep("103", 7))

> sex<-sample(c("m","f"), 20, replace=TRUE)

> weight<-rnorm(n=20, mean=150, sd=3)

> attitude<-sample(1:7, 20, replace=TRUE)

> Dataf<-data.frame(id,sex,weight,attitude)

> Dataf

    id sex   weight attitude

1  100   m 146.5064        6

2  100   f 150.2317        4

3  100   f 149.3686        5

4  100   m 144.7218        7

5  101   m 147.9071        4

6  101   m 148.3802        6

7  101   m 154.4634        1

8  102   m 153.2719        5

9  102   m 148.9821        5

10 102   f 148.0656        1

11 102   f 148.8949        6

12 102   m 146.9963        4

13 102   m 153.0542        4

14 103   m 148.1558        1

15 103   f 148.0482        4

16 103   m 151.8044        2

17 103   f 155.4976        4

18 103   m 150.0423        1

19 103   f 146.0487        5

20 103   m 154.6651        7

> 



Here's the R code I wrote that obviously does not work:



sapply(split(Dataf, Dataf$id), sample, size=2)



I would prefer a data frame (i.e., Dataf2) as the final output and it
should look something like this:



> Dataf2

    id sex   weight attitude

1  100   m 146.5064        6

2  100   m 144.7218        7

3  101   m 147.9071        4

4  101   m 154.4634        1

5  102   m 153.2719        5

6  102   m 148.9821        5

7  103   f 155.4976        4

8  103   f 146.0487        5

> 



Thanks in advance in your assistance.



Tony





------------------------------------------------------------------



Tony N. Brown, Ph.D.

Associate Professor of Sociology

Faculty Head of Hank Ingram House, The Commons

Research Fellow, Vanderbilt Center for Nashville Studies

Vanderbilt University

(615) 322-7518

(615) 322-7505 fax

tony.n.brown at vanderbilt.edu <mailto:tony.n.brown at vanderbilt.edu> 




	[[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver
weer en binden het INBO onder geen enkel beding, zolang dit bericht niet
bevestigd is door een geldig ondertekend document. The views expressed
in  this message and any annex are purely those of the writer and may
not be regarded as stating an official position of INBO, as long as the
message is not confirmed by a duly signed document.



More information about the R-help mailing list