[R] Selecting random subset by ID

Fri Sep 7 22:06:07 CEST 2018

IMO it is worth pointing out that you don't have to write code that solves your problem (else why have this list?) but this whole communication thing works best when you write code that creates a mock set of data that illustrates what you are starting from and some mock output.

The mock input can sometimes be the output of the dput function on a subset of your data, but in your case would probably be something more like

set.seed(42)
ids <- data.frame( id=1:8000, a1=rnorm(8000,0,1),n=sample(2:15,8000,replace=TRUE))
dta <- ids[rep(ids$id,ids$n),]
dta$a0 <- rnorm(nrow(dta),1,2)
dta$value <- with( dta, a0 + a1 )

where the exact way I approach making the data may not be exactly how your data is structured, but clarifying and avoiding that misunderstanding is exactly what you should try to address by learning how to do this when you ask your question.

You may find that reading the above helps you answer your own question, or you can confirm that this data set is close enough and show what code you tried starting with this data.

Oh, and by the way, sending your emails to this list formatted with html is a good way to corrupt your code examples because this list only forwards the plain text part of your email. Start with the plain text setting in your email program and avoid further miscommunication.

More on reproducible examples [1][2][3].

[1] http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example

[2] http://adv-r.had.co.nz/Reproducibility.html

[3] https://cran.r-project.org/web/packages/reprex/index.html (read the vignette)

On September 7, 2018 12:00:07 PM PDT, Bert Gunter <bgunter.4567 using gmail.com> wrote:
>?sample
>
>Should get you started
>
>We expect you to first make an effort to learn about and write your
>own code, rather than asking us to write it for you.
>
>-- Bert
>
>Bert Gunter
>
>"The trouble with having an open mind is that people keep coming along
>and sticking things into it."
>-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>On Fri, Sep 7, 2018 at 11:38 AM David Joubert
><David.Joubert using uottawa.ca> wrote:
>>
>> Hello R users,
>>
>> I am working with a large dataset, including roughly 50 000
>sequential observations (variable "count") for 8000 individuals
>(variable "id"). The dataset is very unbalanced, meaning that some
>individuals have few observations and others have many. Because I plan
>on running Generalized Linear Models for panel data using pglm and the
>package has file size restrictions, I want to create 4 randomly
>selected subsets of 2500 individuals from the main dataset. What
>functions and code would I use to do this?
>>
>> Thanks in advance,
>>
>> David Joubert
>>
>>
>>
>>         [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>______________________________________________
>R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

-- 
Sent from my phone. Please excuse my brevity.