R genetics package now available

Wed Nov 27 15:54:06 CET 2002

The "genetics" package for handling single-locus genetic data is now
available on CRAN in both source and Windows binary formats.  The purpose of
this package is to  make it easy to create and manipulate genetic
information, and to facility use of this information in statistical models.

The library includes classes and methods for creating, representing, and
manipulating genotypes (unordered allele pairs) and haplotypes (ordered
allele pairs).  Genotypes and
haplotypes can be annotated with chromosome, locus, gene, and marker
information. Utility functions compute genotype and allele frequencies, flag
homozygotes or heterozygotes, flag allele carriers
of certain alleles, count the number of a specific allele carried by an
individual, extract one or both alleles, estimate and generate confidence
intervals for measures of single-marker disequlibrium, and test for
departure from Hardy-Weinberg equilibrium.

The package description file and a simple example are appended below.
Comments and contributions are, of course, welcome.

-Greg

DESCRIPTION
===========

Package: genetics
Title: Population Genetics
Version: 0.6.4
Date: 2002-11-13
Author: Gregory Warnes and Friedrich Leisch
Maintainer: Gregory Warnes <gregory_r_warnes at groton.pfizer.com>
Depends: combinat
Description: Classes and methods for handling genetic data. Includes
        classes to represent genotypes and haplotypes at single
        markers up to multiple markers on multiple chromosomes.
        Function include allele frequencies, flagging
        homo/heterozygotes, flagging carriers of certain alleles,
        computing disequlibrium, testing Hardy-Weinberg equilibrium,
        ...
License: GPL
Built: R 1.6.0; sparc-sun-solaris2.8; Tue Nov 12 15:43:20 EST 2002

Index:

HWE.test                Estimate Disequlibrium and Test for
                        Hardy-Weinberg Equilibrium
ci.balance              Experimental Function to Correct Confidence
                        Intervals At or Near Boundaries of the
                        Parameter Space by 'Sliding' the Interval on
                        the Quantile Scale.
diseq                   Estimate or Compute Confidence Interval for the
                        Disequlibrium Parameter
genotype                Genotype or Haplotype Objects.
homozygote              Extract Features of Genotype objects
locus                   Create and Manipulate Locus, Gene, and Marker
                        Objects
summary.genotype        Allele and Genotype Frequency from a Genotype
                        or Haplotype Object
undocumented            Undocumented functions

SIMPLE EXAMPLE
==============

Attaching package `genetics':

        The following object(s) are masked from package:base :

         as.factor 

> ## Create a sample dataset with 3 SNP markers
> 
> g1 <- sample( x=c('C/C', 'C/T', 'T/T'), 
+               prob=c(.6,.2,.2), 20, replace=T)
> g2 <- sample( x=c('A/A', 'A/G', 'G/G'), 
+               prob=c(.6,.1,.5), 20, replace=T)
> g3 <- sample( x=c('C/C', 'C/T', 'T/T'), 
+               prob=c(.2,.4, 4), 20, replace=T)
> 
> y <- rnorm(20) + (g1=='C/C') + 
+      0.25 * (g2=='A/A' | g2=='A/G')
> 
> ## Form into a data frame
> data <- data.frame( y, g1, g2, g3)
> 
> # Create marker labels for the data 

[...]

> a1691g  <- marker(name="A1691G",
+                  type="SNP",
+                  locus.name="MBP2",
+                  chromosome=9, 
+                  arm="q", 
+                  index.start=35,
+                  bp.start=1691,
+                  relative.to="intron 1")
> 
> 

[...]

> 
> data$g1 <- genotype(data$g1, locus=c104t)
> data$g2 <- genotype(data$g2, locus=a1691g)
> data$g3 <- genotype(data$g3, locus=c2249t)
> 
> data
              y  g1  g2  g3
1  -0.084796634 T/T G/G T/C
2   1.454537575 C/C G/G T/T
3  -0.899625344 T/T G/G T/T
4  -1.980679630 C/T A/A T/T
5   0.231087028 C/T A/A T/T
6   2.588083646 C/C A/A T/C
7   0.209338731 C/C A/A T/T
8   1.435823157 C/T G/G T/T
9  -0.078796949 C/C G/G T/T
10 -2.091110058 C/T A/A T/T
11 -0.842655686 C/T G/G T/T
12  1.316828279 C/C G/G T/T
13  0.470126626 C/T A/A T/T
14 -0.364828611 T/T G/A T/T
15 -0.002438264 C/T A/A T/C
16  0.949432430 C/C G/G T/T
17 -0.096626850 C/T G/A T/T
18  1.065637984 T/T A/A T/T
19  0.817213289 C/C A/A T/T
20  0.644714638 C/T G/G T/T
>
> data$g2
Marker: MBP2:A1691G (9q35:1691) Type: SNP
 [1] "G/G" "G/G" "G/G" "A/A" "A/A" "A/A" "A/A" "G/G" "G/G" "A/A" "G/G" "G/G"
[13] "A/A" "G/A" "A/A" "G/G" "G/A" "A/A" "A/A" "G/G"
Alleles: G A 
> 
> summary(data$g2)

Marker: MBP2:A1691G (9q35:1691) Type: SNP

Allele Frequency:
  Count Proportion
A    20        0.5
G    20        0.5

Genotype Frequency:
    Count Proportion
A/A     9       0.45
G/A     2       0.10
G/G     9       0.45

> HWE.test(data$g2)

 -----------------------------------
 Test for Hardy-Wienburg-Equilibrium
 -----------------------------------

Call: 
HWE.test.genotype(x = data$g2)

Raw Disequlibrium for each allele pair (D)

       G    A
  G      -0.2
  A -0.2     

Scaled Disequlibrium for each allele pair (D')

       G    A
  G      -0.8
  A -0.8     

Correlation coefficient for each allele pair (r)

      G   A
  G 1.0 0.8
  A 0.8 1.0

Overall Values (mean absolute-value weighted by expected allele frequency)

     Value
  D   -0.2
  D'  -0.8
  r    0.8

Confidence intervals computed via bootstrap using 1000 samples

             Observed   95% CI                   NA's Contains Zero?
  Overall D  -0.2000000 (-0.2475000, -0.1093750) 0    *NO*          
  Overall D' -0.8000000 (-1.0000000, -0.4666667) 0    *NO*          
  Overall r   0.8000000 ( 0.4666667,  1.0000000) 0    *NO*          

Significance Test:

 Pearson's Chi-squared test with simulated p-value (based on 10000
 replicates)

data:  data$g2 
X-squared = 12.8, df = NA, p-value = 7e-04

>
> summary(lm( y ~ homozygote(g1,'C') +
                allele.count(g2, 'G') +
+                 g3, data=data))
+ 
Call:
lm(formula = y ~ homozygote(g1, "C") + allele.count(g2, "G") + 
    g3, data = data)

Residuals:
    Min      1Q  Median      3Q     Max 
-1.6686 -0.6625 -0.0172  0.6973  1.6196 

Coefficients:
                        Estimate Std. Error t value Pr(>|t|)  
(Intercept)               0.3499     0.6229   0.562   0.5821  
homozygote(g1, "C")TRUE   1.2124     0.4778   2.537   0.0220 *
allele.count(g2, "G")     0.1193     0.2429   0.491   0.6298  
g3T/T                    -0.7724     0.6414  -1.204   0.2460  
---
Signif. codes:  0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 

Residual standard error: 1.013 on 16 degrees of freedom
Multiple R-Squared: 0.3405,     Adjusted R-squared: 0.2169 
F-statistic: 2.754 on 3 and 16 DF,  p-value: 0.07661