[R] X matrix deemed to be singular and cbind

Bert Gunter gunter.berton at gene.com
Fri Jul 26 17:22:00 CEST 2013


Soumitro:

Have you read "An Introduction to R." If not, do so, as some of your
confusion appears related to basic concepts (e.g. of factors)
explained there.

1. Presumably your categorical variables are factors, not character.
If so, when you cbind() them, you cbind their integer codes, yielding
numerical variables. This produces an in incorrect design matrix in
fitting -- 1 df per categorical variable instead of 1 less than the
number of levels. Also see ?cbind.

2. Produces the correct design matrix, but you are overfitting,
presumably because of many different levels for your categorical
variables. I suggest you consult with a local statistician to decide
how best to handle this, as you seem to be out of your depth with
regard to model fitting.

... unless I have misunderstood, of course.

Cheers,
Bert

On Fri, Jul 26, 2013 at 7:55 AM, Soumitro Dey <soumitrodey1 at gmail.com> wrote:
> Hi list,
>
> While the "X matrix deemed to be singular" question has been answered in
> the list for quite a few times, I have a twist to it.
>
> I am using the coxph model for survival analysis on a dataset containing
> over 160,000 instances and 46 independent variables and I have 2 scenarios:
>
> 1. If I use cbind on the 46 independent variables (many of which are
> categorical), coxph runs without any frills. The problem however is that it
> won't report which of the categorical variables (e.g. VERY HIGH, HIGH,
> NEUTRAL, LOW or VERY LOW) are actually meaningful/significant(e.g. XHIGH
> ***, XLOW ., etc). Is there any way to check this?
>
> 2. If I don't use cbind, assuming it'll give me the details I am looking
> for in the previous step, it throws me the "X matrix deemed to be
> singular", more precisely: "X matrix deemed to be singular; variable 130
> 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149
> 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168
> 169 170 171 172 173 174 175 176 177 178 179 180 181"
>
> Could anyone please elaborate on how to get around problem #1 or #2?
>
> Thanks!
> SD
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm



More information about the R-help mailing list