[Rd] R package {genetics} function summary.genotype -- H and PIC return incorrect values. (PR#3642)

gerard.tromp at sanger.med.wayne.edu gerard.tromp at sanger.med.wayne.edu
Mon Aug 4 05:45:09 MEST 2003


Greetings!

I found a bug in the function summary.genotype (details below). Essentially
it appears that the original code from David Duffy used af, allele
frequency, as a proportion but in the genetics package af is allele
frequency as a count.

I tried to submit a bug report together with the fix to CRAN (below). I am
unsure of the success since the bug report does not appear in the list. I am
therefore sending it by e-mail.

Sincerely,

Gerard
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Gerard Tromp, Ph.D.                             vox: 313-577-8773
Center for Molecular Medicine and Genetics      fax: 313-577-7736
Wayne State Univ. Schl. of Medicine
540 E. Canfield, Detroit, MI 48201    gtromp at cmb.biosci.wayne.edu
                                mailto:tromp at sanger.med.wayne.edu
                                     http://cmmg.biosci.wayne.edu



In function summary.genotype:
Heterozygosity (H) and PIC are computed using af (allele frequency),
however, af is derived as a frequency count whereas the computation of H and
PIC are in terms of frequency as a proportion.

Using the example data provided (genetics.R) H and PIC return large negative
numbers whereas they are bounded 0,1.

Output from buggy version:
*************************************************************************
Marker: MBP2:C-104T (9q35:-104) Type: SNP

Number persons typed: 100 (100%)

Allele Frequency: (2 alleles)
  Count Proportion
C   137       0.69
T    63       0.32


Genotype Frequency:
    Count Proportion
C/C    59       0.59
C/T    19       0.19
T/T    22       0.22

Heterozygosity (Hu)  = -22967    <------ Impossible
Poly. Inf. Content   = -1.5e+08  <------ Impossible
***************************************************************************

Output from fixed version:
***************************************************************************
Marker: MBP2:C-104T (9q35:-104) Type: SNP

Number persons typed: 100 (100%)

Allele Frequency: (2 alleles)
  Count Proportion
C   137       0.69
T    63       0.32


Genotype Frequency:
    Count Proportion
C/C    59       0.59
C/T    19       0.19
T/T    22       0.22

Heterozygosity (Hu)  = 0.44     <--- Correct
Poly. Inf. Content   = 0.34     <--- Correct
***************************************************************************




Fix is detailed below.
***************** 2539,2554 c 2539,2565
********************************************
    ### from code submitted by David Duffy <davidD at qimr.edu.au>
    #
    n.typed<-sum(gf)
    correction<-n.typed/max(1,n.typed-1)
#  Gerard Tromp 20030803
#  Heterozygosity and PIC reported are incorrect.
#  Problem due to use of af which is a count rather than prop.table(af) a
#    proportion (fraction) less than 1. Definition of af changed?
#
##  Original
#    ehet<-(1-sum(af*af))
#   the following produces desired result
    ehet<-(1-sum(prop.table(af)*prop.table(af)))
##  Original
#    matings<- (af %*% t(af))^2
#   the following produces desired result
    matings <- (prop.table(af) %*% t(prop.table(af)))^2
    uninf.mating.freq <- sum(matings)-sum(diag(matings))
    pic<- ehet - uninf.mating.freq

    retval$Hu <- correction * ehet
    retval$pic <- pic
    retval$n.typed <- n.typed
    retval$n.total <- length(object)
    retval$nallele <- nallele(object)
    #
    ###
****************************************************************************
******



More information about the R-devel mailing list