[R] predicted values
gunter.berton at gene.com
Tue Feb 4 00:58:04 CET 2014
... but do note that doing what you describe (using predicted values
for missings) can mess up inference: it obviously results in
underestimating error variability. If you're not doing inference, then
probably no harm, no foul. If you are, then here's to
irreproducibility! If you want to handle missings and still get
meaningful inference (an oxymoron?), then find someone expert in such
matters to consult. R has several packages devoted to this (but I'm
not the person to advise about them).
Also note that often scientists treat censoring as missing. That's
another booboo. And my humble apology if this is not you.
Finally note that graphics often handles missings sensibly, gracefully
ignoring them. So if graphs are what you seek, maybe you don't need to
worry about it.
And, it should go without saying that given my complete ignorance of
what you're up to, all the above should be taken with the appropriate
dose of salt.
Genentech Nonclinical Biostatistics
"Data is not information. Information is not knowledge. And knowledge
is certainly not wisdom."
H. Gilbert Welch
On Mon, Feb 3, 2014 at 2:23 PM, Felipe Carrillo
<mazatlanmexico at yahoo.com> wrote:
> Hi Joshua,
> Thanks for the suggestion, I will check into log link. I just basically want to fill in
> missing values for days where data is not available. Negative values definetely won't work
> for the kind of data that I am collecting.
> On Saturday, February 1, 2014 7:51 PM, Joshua Wiley <jwiley.psych at gmail.com> wrote:
> Dear Felipe,
>>That is a normal behavior --- The prediction for that simple model
>>decreases over time, and ends up negative. If the outcome cannot take
>>on negative values, treating it as a continuous gaussian may not be
>>optimal --- perhaps some transformation, like using a log link so that
>>the expoentiated values are always positive would be better?
>>Alternately, if the predictions are going negative, not because the
>>data is over all, but say there is a quick decrease in values in the
>>first part of time but later on it slows, but if you have an overly
>>simplisitic time model, it may just keep
> decreasing. Using a smoother
>>with a higher basis dimensions may help more accurately model the
>>function over the span of time in your dataset and then not have
>>I do not think that there would be any straight forward 'force' the
>>model to be positive only.
>>On Sat, Feb 1, 2014 at 5:05 PM, Felipe Carrillo
>><mazatlanmexico at yahoo.com> wrote:
>>> Consider this dummy dataset.
>>> My real dataset with over 1000 records has
>>> scatter large and
> small values.
>>> I want to predict for
> values with NA but I
>>> get negative predictions. Is this a normal
>>> behaviour or I am missing a gam argument
>>> to force the model to predict positive values.
>>> test <- data.frame(iddate=seq(as.Date("2014-01-01"),
>>> as.Date("2014-01-12"), by="days"),
>>> mod <- gam(value ~ s(as.numeric(iddate)),data=test)
>>> # Predict for values with NA's
>>> test$pred <- with(test,ifelse(is.na(value),predict(mod,test),value))
>>> [[alternative HTML version deleted]]
>>> R-help at r-project.org mailing list
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>Ph.D. Student, Health Psychology
>>University of California, Los Angeles
>>Senior Analyst - Elkhart Group Ltd.
> [[alternative HTML version deleted]]
> R-help at r-project.org mailing list
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help