%\VignetteIndexEntry{pedgene_manual} %\VignetteDepends{pedgene} %\VignetteKeywords{pedigree, variant, kinship} %\VignettePackage{pedgene} %************************************************************************** % % # $Id:$ % $Revision: $ % $Author: Jason Sinnwell$ % $Date: $ \documentclass[letterpaper]{article} \usepackage{Sweave} \textwidth 7.5in \textheight 8.9in \topmargin 0.1in \headheight 0.0in \headsep 0.0in \oddsidemargin -30pt \evensidemargin -30pt <>= options(width = 100) desc <- packageDescription("pedgene") @ \title{pedgene package vignette \\ Version \Sexpr{desc$Version}} \author{Jason Sinnwell and Daniel Schaid} \begin{document} \maketitle \section{Introduction} This document is a brief tutorial for the pedgene package, which implements methods described in Schaid et al., 2013~\cite{Schaid}, entitled {\sl ``Multiple Genetic Variant Association Testing by Collapsing and Kernel Methods With Pedigree or Population Structured Data''}. We begin by showing the data structure and requirements of the input data, and continue with examples on running the method. If the pedgene package is not loaded, load it now. <>= library(pedgene) @ \section{Example Data} Three example data frames are provided within the pedgene package: \begin{itemize} \item{\em example.ped: }{ data.frame with columns ped, person, father, mother, sex, and trait; for 3 families of 10 subjects each, and 9 unrelated subjects} \item{\em example.geno:}{ data.frame with ped and person to match lines to example.ped, and minor allele count for 20 simulated variant positions} \item{\em example.map: }{data frame with columns for gene name and chromosome matching columns in example.geno} \end{itemize} \noindent Load the example data and look at some of the values. <>= data(example.ped) example.ped[c(1:10,31:39),] data(example.geno) dim(example.geno) example.geno[,c(1:2,10:14)] data(example.map) example.map @ Note the following features: \begin{itemize} \item The pedigree data has more subjects than subjects that have genotypes, to keep pedigrees connected. \item Subjects with genotype data are matched to the pedigree data by the ped and person IDs in both data sets \item The number of non-id columns in the genotype data frame must match the number of rows in the map data frame \item The two genes provided are actually the same simulated data, but we show they are analyzed differently for chromosome 1 versus chromosome X \end{itemize} \section{Case-Control} \subsection*{No covariate adjustment} We perform a typical use of pedgene with default settings on the case-control status provided in the example data. <>= pg.cc <- pedgene(ped=example.ped, geno=example.geno, map=example.map) print(pg.cc) @ \noindent The results show the output for the two genes in a table, containing gene, chromosome, the number of variants used, the non-informative variants that were removed, the kernel statistic and p-value, and the burden statistic and p-value. The kernel p-value is calculated using Kounen's saddlepoint method~\cite{Kounen}, and the burden p-value is based on the normal distribution. \subsection*{With covariate adjustment} The pedgene function can work with case-control data and an adjusted trait using a covariate. The data contains the sex variable, so we use it in a logistic regression to get adjusted fitted values for the trait, which we add to example.ped at a new column named {\sl trait.adjusted}, which pedgene recognizes and uses in calculating residuals. <>= binfit <- glm(trait ~ (sex-1),data=example.ped, family="binomial") example.ped$trait.adjusted[!is.na(example.ped$trait)] <- fitted.values(binfit) example.ped[1:10,] pg.cc.adj <- pedgene(ped=example.ped, geno=example.geno, map=example.map) summary(pg.cc.adj, digits=4) @ \noindent The results are now shown from the summary method, which adds the call to the pedgene function. The results are different because we now input the fitted values from the logistic regression, which of course can be a more complex model than what we use here. \section{Continuous Trait} \subsection*{No covariate adjustment} The pedgene method works similarly for continous traits. We first create a different version of the pedigree data to have a continous trait, which we simulate. We simulate complete trait data for all subjects, but pedgene will only use the subjects for which we have genotype data. <>= set.seed(10) cont.ped <- example.ped[,c("famid", "person", "father", "mother", "sex")] beta.sex <- 0.3 cont.ped$trait <- (cont.ped$sex-1)*beta.sex + rnorm(nrow(cont.ped)) pg.cont <- pedgene(ped = cont.ped, geno = example.geno, map = example.map) print(pg.cont, digits=4) @ \noindent The results are now non-significant because we simulated a trait that is slightly influenced by sex, but not the genotype data. \subsection*{With covariate adjustment} Now we add the {\sl trait.adjusted} column to the cont.ped data set and run pedgene again. <>= gausfit <- glm(trait ~ (sex-1),data=cont.ped, family="gaussian") cont.ped$trait.adjusted <- fitted.values(gausfit) cont.ped[1:10,] pg.cont.adj <- pedgene(ped=cont.ped, geno=example.geno, map=example.map) summary(pg.cont.adj, digits=5) @ \section{Remarks} \begin{itemize} \item The pedgene package is deigned to be rigid in what it expects for pedigree and genotype data, which puts a little extra burden on the user. However it should reduce confusion when running the method. \item One potential run-time bottleneck is a set of pedigree checks performed on all pedigrees. If the pedigrees are clean and you have many subjects, you may wish to skip this step with the argument checkpeds=FALSE. \item By default, the method uses Madsen-Browning weights~\cite{Madsen}, but it allows user-defined weights per variant with the weights argument. \end{itemize} \section{R Session Information} <>= toLatex(sessionInfo()) @ \begin{thebibliography}{} \bibitem{Schaid} Schaid DJ, McDonnell SK., Sinnwell JP, Thibodeau SN. (2013) {\bf Multiple Genetic Variant Association Testing by Collapsing and Kernel Methods With Pedigree or Population Structured Data} {\em Genet Epidemiol}, 37(5):409-18. \bibitem{Kounen} Kounen D (1999) {\bf Saddle point approximatinos for distributions of quadratic forms in normal variables} {\em Biometrika}, 86:929 -935. \bibitem{Davies} Davies RB (1980) {\bf Algorithm AS 155: The Distribution of a Linear Combination of chi-2 Random Variables} {\em Journal of the Royal Statistical Society. Series C (Applied Statistics)}, 29(3):323-33 \bibitem{Madsen} Madsen BE, Browning SR (2009). {\bf A groupwise association test for rare mutations using a weighted sum statistic} {\em PLoS Genet} 5(2):e1000384. \end{thebibliography} \end{document}