[R] genotype analysis

Anne-Marie Ternes amternes at gmail.com
Wed Mar 26 11:01:26 CET 2008


Dear mailing list,

I'm still quite a newbie in the statistical analysis of
genotype/allele data, resp. more generally in the analysis of
categorical variables. Moreover, I'm currently totally confused by the
many R packages available to do such analysis.

Here is my case: I've got a list of genes, and a number of
case-control population pairs, and for each population and gene, the
various genotypes that have been found. I've got both aggregate data
(ex. gene1: homozygote wildtype: 201, heterozygote mutation carrier:
34, homozygote mutation carrier: 5) and per-gene data (i.e. for gene1
a list of e.g. "V/V", "V/I", "II" etc).

The question asked is whether there is a difference in the mutation
pattern between the case and the control groups influencing the
outcome, both at the level of a single gene, and at the level of their
combination. Moreover, I would like to check for linkage
desequilibrium (LD), as I know that some of these genes are located
quite closely on the chromosome.

OK, so up to now I've been doing the Chi-square tests, McNemar matched
pairs test, Fisher test if my numbers were too small.

As for the LD question, if I have understood correctly, I have to use
log-linear regression. I have been trying several R packages, and I'm
so confused now, because I don't know which one is best suited for my
problem. I have to add that I'm new also to log-linear regression...

I've used "hwde", and read the paper on which it is based (see hwde
doc), but the package leaves out certain output rows that are shown in
the paper, and it doesn't show which of the output rows is
significant, as the paper does. Is there any simply way to interpret
"hwde" output (something like a p-value)?

Then there are the "GeneticsBase", "Genetics", "mapLD",
"Hardy-Weinberg" packages. Some work only for a single gene, some
apply a thing called "MLE", some "general linearized models", etc.

I know these questions are as much basic statistical than R questions.
But I'd be glad if you could help me find the best solution for my
type of analysis, resp. point me to good resources that show me how to
do this. The problem is that most resources show "how to" do the
analysis, but they don't explain at all how to *interpret* their
output.

Thanks a lot in advance,

Anne-Marie



More information about the R-help mailing list