[R] NA as a result of using GLM

Marc Schwartz marc_schwartz at me.com
Mon Jun 15 16:57:34 CEST 2009


On Jun 15, 2009, at 5:54 AM, Paul Christoph Schröder wrote:

> Hi all!
> Maybe someone could help me with the following. I know this hasn't  
> directly to do with ecology but I'm also using glm.
>
> I have a list of 16 genes and 10 samples. The samples are of two  
> types, 4 Ctrl and 6 Diseased. If,
>
> labelInd<-as.factor(c(rep("0",4),rep("1",6)))
> genes.glm<-glm(labelInd ~ ., family=binomial, data=mat)
>
>
> beeing "mat" the 10x16 matrix (without NAs), I got 17 values, first  
> the intercept, 9 numerical values and "NA" for the last 7 genes.  
> Does anybody you know why this is happening or how I can model using  
> the 16 genes?
>
> I hope anyone could help me with this!
> Many thanks in advance,
>
> Paul

More than likely, the 7 genes for which you are getting NA's are  
collinear to other genes. Thus you get NA's. If you switched the order  
of the 7 genes for which you are getting NAs so that they come first  
in the formula, you would get NAs for others.

If you use:

   summary(genes.glm)

you will likely see a warning message about singularities in the  
coefficient table header line. Something like:

   Coefficients: (7 not defined because of singularities)

I would use cor(mat) to take a look at the correlation matrix for your  
data so that you can review this in more detail.

BTW, with only 10 observations, you are significantly overfitting the  
model by using so many covariates. You typically need at least 10 to  
20 'events' for each covariate degree of freedom in a logistic  
regression model. With only 6 diseased (events) you really don't even  
have enough data to support one covariate. The study, presuming an 'a  
priori' design, is way underpowered for what you are attempting to do.

HTH,

Marc Schwartz




More information about the R-help mailing list