[R] Constrained vector permutation

Charles C. Berry cberry at tajo.ucsd.edu
Sat Jan 30 04:23:07 CET 2010


On Fri, 29 Jan 2010, Andrew Rominger wrote:

> Being reasonably sure that all valid permutations are equally probable is
> important to me.  I've played around with search algorithms in permuting
> contingency tables and find that possible solutions decrease rapidly once
> one starts assigning values, particularly if small values are assigned
> first, so it would seem all solutions are not equally probable (not only
> that but one frequently encounters "dead ends" where there are values left
> to assign and no allowable place to put them).  As such I think I'd opt to
> use sample()... several times if needed.
>
> To clarify, yes, I only need one valid permutation, the idea is I'll
> generate 1000s of ordered vectors, and then for each one generate one valid
> permutation.
>
> Thanks very much for the help and insights--
> Andy

Andy,

If you have some sense of importance sampling and/or MCMC you might look 
at

Zaman and Simberloff (2002, Environmental and Ecological Statistics 9,
405--421).

which concerns sampling a binary matrix with fixed margins - not quite 
your problem, but akin to it in being a combinatorial nightmare without 
an obvious direct solution of workable size for real problems.

They define a neighborhood for each allowable matrix s.t. swapping a pair 
of 1's at ij and kl with a pair of 0's at il and kj  (which doesn't 
violate the margin constraints) leads to a member of the neighborhood. 
IIRC, the size of the neighborhood and the sizes of the neighborhoods of 
the members of its neighborhood determine the probabilities of staying put 
or moving to get the next element of the MCMC chain and which member of 
the neighborhood to choose.

I suppose something like that (i.e. defining neighborhoods of allowable 
permutations, measuring their size, and using this to guide sampling or 
develop importance weights) might apply in your case. Maybe something like 
this: start with an ordering of your n-vector that conforms to the 
constraints, look at all the choose(n,2) pairs of elements and check which 
of them could be exchanged to yield another conforming ordering; the 
allowable swaps define the neighborhood, and their number is its size.

So, the idea is to develop an MCMC sampler. Run it for each ordered vector 
to get past the burn in, then take one value from it.

HTH,

Chuck

>
>
> On Thu, Jan 28, 2010 at 3:04 PM, Thomas Lumley <tlumley at u.washington.edu>wrote:
>
>> On Thu, 28 Jan 2010, Jason Smith wrote:
>>
>>  It wouldn't be guaranteed to produce any usable permutation, but it seems
>>>> like it would be much faster and so could be repeated until an acceptable
>>>> vector is found.  What do you think?
>>>>
>>>> Thanks--
>>>> Andy
>>>>
>>>>
>>> I think I am not understanding what your ultimate goal is so I'm not
>>> sure I can give you appropriate advice.  Are you looking for a single
>>> valid permutation or all of them?
>>>
>>> Since that constraint sets a ceiling on each subsequent value, it
>>> seems like you could solve this problem more easily and quickly by
>>> using a search strategy instead of random sampling or generating all
>>> permutations then testing.  The constraint will help prune the search
>>> space so you only generate valid permutations.  Once you are examining
>>> a particular element you can determine which of the additional
>>> elements would be valid, so only consider those.
>>>
>>
>> It's easy to generate valid permutations this way.  It does not appear
>> straightforward to ensure that all valid permutations are sampled with equal
>> probability, which I thought was part of the specification of the problem.
>>
>>      -thomas
>>
>>
>> Thomas Lumley                   Assoc. Professor, Biostatistics
>> tlumley at u.washington.edu        University of Washington, Seattle
>>
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Charles C. Berry                            (858) 534-2098
                                             Dept of Family/Preventive Medicine
E mailto:cberry at tajo.ucsd.edu	            UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901



More information about the R-help mailing list