[R] Allelic Differentiation, sampling, unique(), duplicated()

Thomas Lumley tlumley at u.washington.edu
Thu Sep 4 17:09:59 CEST 2003


On Fri, 5 Sep 2003, Philip Rhoades wrote:

> Hi people,
>
> I have made some progress trying to work out how to solve this problem
> but I have got a bit stuck - sorry if this turns out to be a simple
> exercise . .
>
> Allelic Differentiation (AD) in genetics measures the number of
> different alleles between (say) two populations eg:
>
> Organisms in Pop 1 have alleles: a, b, c, d, e
>
> Organisms in Pop 2 have alleles: b, b, c, d, e
>
> Different (unique) alleles (n) are: a
>
> [unique() does not do what I want here for comparing these two vectors
> and I can't get combinations of unique() and duplicated() to work
> either.]

YOu could do it with

union(setdiff(one,two), setdiff(two,one))

and there's probably a direct way to do it with match().  We should
probably have a setsymdiff() function to add to the others.


> Total alleles = 10
>
> Therefore AD = (2 * n) / 10 = 0.2
>
> What I want to do is compare two populations of 200 organisms each but
> sampling for only 20 at a time.
>
> So there are 200!/((200-20)! * 20!) possible combinations of samples in
> each population.
>
> For all possible combinations of sample pop1 and sample pop2 I want to
> measure AD ie (200!/((200-20)! * 20!) * 200!/((200-20)! * 20!) )
> calculations.

This is far too many calculations
R> choose(200,20)
[1] 1.613588e+27


> As well as the unique allele problem, can someone suggest how I can do
> the sampling loops?
>

You can't. 10^27 is a very large number.

I would suggest choosing pop1 and pop2 at random, a few thousand or
hundred thousand times (depending on the accuracy you need).


	-thomas




More information about the R-help mailing list