[R] linkage disequilibrium

Mon Aug 8 14:30:55 CEST 2005

> Date: Thu,  4 Aug 2005 19:36:35 +0200
> From: Cristian <cristian at biometria.univr.it>
> Subject: [R] linkage disequilibrium
> To: r-help at stat.math.ethz.ch
> Message-ID: <1123176995.42f252238a47a at biometria.univr.it>
> Content-Type: text/plain; charset=ISO-8859-1
>
> I'm using the package "Genetics", and I'm interested in the computation of D'
> statistics for Linkage Disequilibrium, for which the LD() command has been
> realised. Unfortunately I don't find any reference on "how" the D' is computed
> by the LD() function. In the package documentation it is generally referred as
> "MLE" estimation, but references are not provided. Does anybody knows how it is
> obtained or, at least, some references?
>
> Are there any other R package performing the D' computation both for phased and
> unphased genotype?
>
> Thanks!  Cristian
>

You need to look at the code:
getAnywhere("LD.genotype")

See any standard reference such as Bruce Weir's _Genetic Data Analysis_
(Sinauer Associates) or Pak Sham's book on statistical genetics for the
background to the algorithm.

The chi-square testing D=0 from LD() is twice what it should be, and you
may be confused (I know I was) by the fact that the marginal allele
frequencies are estimated using non-missing data for each locus in turn.
This means the bounds (pmin and pmax) for the AB haplotype frequency are
different from that in the actual table used to maximize the likelihood.
So, you will get different answers from programs using jointly
complete observations only.

Several other packages for haplotype analysis are on CRAN.  Package
haplo.stats has the haplo.em() function to give the MLEs for the haplotype
frequencies.  From these you can easily calculate D etc.  Package hwde
estimates nonstandard disequilibrium coefficients in a loglinear
framework, and can be used to compare different sample disequilibria.
Note that haplo.stats and hapassoc are aimed specifically at comparing
groups or testing for association to other traits. My package
gllm is not as easy to use but can combine phased and unphased data in
loglinear models -- you could probably use cat in the same way.

David Duffy.

| David Duffy (MBBS PhD)                                         ,-_|\
| email: davidD at qimr.edu.au  ph: INT+61+7+3362-0217 fax: -0101  /     *
| Epidemiology Unit, Queensland Institute of Medical Research   \_,-._/
| 300 Herston Rd, Brisbane, Queensland 4029, Australia  GPG 4D0B994A v