[R] Regression model when dependent variable can only take positive values

ONKELINX, Thierry Thierry.ONKELINX at inbo.be
Tue Dec 6 14:15:54 CET 2011

Dear Michael,

Did you measure newborns? If not center age to a value that makes sense in relation with the range of age in your dataset. Then the intercept will be the height at the reference age. And most likeli non-negative.

Best regards,


ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest
team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
Kliniekstraat 25
1070 Anderlecht
Thierry.Onkelinx op inbo.be

To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of.
~ Sir Ronald Aylmer Fisher

The plural of anecdote is not data.
~ Roger Brinner

The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data.
~ John Tukey

-----Oorspronkelijk bericht-----
Van: r-help-bounces op r-project.org [mailto:r-help-bounces op r-project.org] Namens Michael Haenlein
Verzonden: dinsdag 6 december 2011 14:09
Aan: r-help op r-project.org
Onderwerp: [R] Regression model when dependent variable can only take positive values

Dear all,

I would like to run a regression of the form lm(y ~ x1+x2) where the dependent variable y can only take positive values. Assume, for example, that y is the height of a person (measured in cm), x1 is the gender (measured as a binary indicator with 0=male and 1=female) and x2 is the age of the person (measured in years).

When I run a simple lm(y ~ x1+x2), I obtain an intercept value that is negative. I interpret that in a way that a person who is male (x1=0) and just born (x2=0), has a negative height. This evidently does not make sense. I therefore assume that my estimates might be biased and that I need to use some other form of estimation that takes account of the fact that
y>0 for all observations.

Could anybody please tell me which type of regression would be most recommendable for this type of analysis?

Thanks very much in advance,


	[[alternative HTML version deleted]]

R-help op r-project.org mailing list
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

More information about the R-help mailing list