[BioC] GWASTools: quasi-/perfect linear separation

Thu Sep 4 12:56:32 CEST 2014

This is not really a question but more of a warning to other users.

I have performed a regression analysis using the assocTestRegression function under three different models (dominant,recessive,additive). My data set contains ~3 million markers which have been filtered so that only SNPs with >= MAF of 10% are included. Please note that this filter was applied with both cases and controls as one big data set (i.e. I did not perform the filter for cases and controls separately). 

Once I have examined the results of the association under the recessive model, I noticed very large beta estimates (8-9). When I looked at the genotype counts, I realised that this was due to the fact that in some SNPs, there is perfect linear separation. In other words, the AA genotype has a count of 0 in cases and a count of 170 in controls, which leads to inflated estimates.

I was surprised to find that the function does not throw a warning for this or drops the analysis for SNPs where this occurs. 

Regards,
Danica

 -- output of sessionInfo(): 

--
Sent via the guest posting facility at bioconductor.org.