[R] count data as independent variable in logistinc regression

Bert Gunter gunter.berton at gene.com
Tue Oct 2 18:28:07 CEST 2012


This is not primarily an R question, although I grant you that it
might intersect packages in R that do what you want. Nevertheless, I
think you would do better posting on a statistical list, like
stats.stackexchange.com . Maybe once you've figured out there what you
want, you can come back to R to find an implementation.

Cheers,
Bert

On Tue, Oct 2, 2012 at 9:10 AM,  <vlagani at ics.forth.gr> wrote:
>
> Dear R users,
>
> I would like to employ count data as covariates while fitting a logistic
> regression model. My question is:
>
> do I violate any assumption of the logistic (and, more in general, of the
> generalized linear) models by employing count, non-negative integer
> variables as independent variables?
>
> I found a lot of references in the literature regarding hot to use count
> data as outcome, but not as covariates; see for example the very clear
> paper: "N E Breslow (1996) Generalized Linear Models: Checking Assumptions
> and Strengthening Conclusions, Congresso Nazionale Societa Italiana di
> Biometria, Cortona June 1995", available at
> http://biostat.georgiahealth.edu/~dryu/course/stat9110spring12/land16_ref.pdf.
>
> Loosely speaking, it seems that glm assumptions may be expressed as follows:
>
> iid residuals;
> the link function must correctly represent the relationship among dependent
> and independent variables;
> absence of outliers
>
> Does everybody knows whether there exists any other assumption/technical
> problem that may suggest to use some other type of models for dealing with
> count covariates?
>
> Finally, please notice that my data contain relatively few samples (<100)
> and that count variables' ranges can vary within 3-4 order of magnitude
> (i.e. some variables has value in the range 0-10, while other variables may
> have values within 0-10000).
>
> A simple example code follows:
>
> ###########################################################
>
> #genrating simulated data
> var1 = sample(0:10, 100, replace = TRUE);
> var2 = sample(0:1000, 100, replace = TRUE);
> var3 = sample(0:100000, 100, replace = TRUE);
> outcome = sample(0:1, 100, replace = TRUE);
> dataset = data.frame(outcome, var1, var2, var3);
>
> #fitting the model
> model = glm(outcome ~ ., family=binomial, data = dataset)
>
> #inspecting the model
> print(model)
>
> ###########################################################
>
> Regards,
>
> --
> Vincenzo Lagani
> Research Fellow
> BioInformatics Laboratory
> Institute of Computer Science
> Foundation for Research and Technology - Hellas
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm




More information about the R-help mailing list