[R] Puzzle

Jeff Newmiller jdnewmil at dcn.davis.ca.us
Thu Aug 26 19:42:42 CEST 2010


Look for a vectorized solution such  as ?table or ?aggregate to obtain a list of combination counts, followed by logical indexing or ?subset to get a set of valid combinations, and then use ?sample to get your random selections of locations/surveyors and then process only those combinations from your original data set. Keep in mind that constraints that you apply late in the selection process that can be applied earlier or on less data will usually become more efficient.

If you provide actual sample statements that you have already tried, you may get a less abstract response. See the posting guide mentioned at the end of every posting for more guidance.

"Ben Holt" <BHolt at bio.ku.dk> wrote:

>I have data similar to this:
>
>Location Surveyor Result
>A        1         83
>A        2         76
>A        3         45
>B        1         71
>B        4         67
>C        2         23
>C        5         12
>D        3         34
>E        4         75
>F        4         46
>G        5         90
>etc (5 million records in total)
>
>I need to divide the data to many subsets then randomly select 5 different locations and 5 different surveyors (one at each of the 5 randomly selected locations) for each subset.
>
>The function I have written basically picks five locations and then 1 surveyor in each location, checks that there are five different surveyors and if there isn't tries again.  The problem is that for some subsets this doesn't work.
>
>Some subsets don't have enough locations/surveyors or both, but this can be checked for easily.  The problem subsets do have enoughs locations and surveyors but still cannot produce 5 locations each with a different surveyor.  The matrix below demonstrates such a subset:
> 
>                  locations
>                  A B C D E
>                1 1 0 0 0 0
>Surveyors       2 1 0 0 0 0
>                3 1 0 0 0 0
>                4 1 0 0 0 0
>                5 1 1 1 1 1
>
>I cannot think of a way to check for such a situation and therefore I have simply programmed the function to give up after 100 attempts if it can't find a solution.  This is not very satisfactory however as the analysis takes a very long time to run and it would also be very useful useful for me to know how many suitable solution there are.
>
>I reckon some of you clever folk out there must be able to think of a better solution.
>
>Any advice appreciated,
>
>Ben
>
>______________________________________________
>R-help at r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
                                      Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
---------------------------------------------------------------------------
Sent from my phone. Please excuse my brevity.



More information about the R-help mailing list