[R] Which distribution best fits the data?

Matthieu Stigler Matthieu.Stigler at gmail.com
Tue Jul 1 13:25:03 CEST 2008


Hello

Regressions with time series model is something more complicate than 
usual, I recommend you to read more about it in any time series manual. 
The biggest problem  comes from the so called potential spurious 
regression, that is your regression can lead to errnoneous conclusions 
(if you understand french, see the wikipedia page I wrote 
http://fr.wikipedia.org/wiki/Régression_fallacieuse with R simulation 
examples).

In your case, you should actually rather test for stationnarity of all 
variables (not only residuals) to ensure that you results are correct. 
See packages urca and vars for this.


Hope this helps

Matthieu

> Jenny,
>
> You may try here: http://en.wikipedia.org/wiki/Normality_test which
> mentions the R package nortest
>
> and here;
>
> The Probability Plot Correlation Coefficient Test for Normality, James
> J. Filliben:
>
> http://www.jstor.org/sici?sici=0040-1706(197502)17%3A1%3C111%3ATPPCCT%3E2.0.CO%3B2-6&cookieSet=1
> http://www.minitab.com/resources/articles/normprob.pdf
> http://engineering.tufts.edu/cee/people/vogel/publications/probability1986.pdf
>
> Regards,
> Tom
>
> Jenny Barnes wrote:
> > Hi Ben and R-help communtiy,
> >
> > More specifics:
> >
> > I am using sea-surface temperature (averaged over an area) and also
> > winds (averaged over an area) to use in a linear regression model as
> > predictors for rainfall over a small region of Africa. So I have 1
> > time series of sea-temp and one timeseries of rainfall (over 36 years
> > - seasonal average) and I have performed the linear regression between
> > the 2. I now want to check if the residuals are normally distributed.
> > If they are not I want an R function that will tell me what
> > distribution they are most similar to - so that I can apply a suitable
> > transformation to make the data normal.....
> >
> > Any more tips now that you have a few more details perhaps? :o)
> >
> > Thanks for your time,
> >
> > Jenny
> >
> > On Mon, 30 Jun 2008, Ben Bolker wrote:
> >
> >> Jenny Barnes <jmb <at> mssl.ucl.ac.uk> writes:
> >>
> >>>
> >>> Dear R-help community,
> >>>
> >>> Does anybody know of a stats function in R that tells you which
> >>> distribution best fits your data? I have tried look through the
> >>> archives
> >>> but have only found functions that tell you if it's normal or log etc.
> >>> specifically - I am looking for a function that tells you (given a
> >>> timeseries) what the distribution is.
> >>>
> >>> Any help/advice will be greatly appreciated,
> >>>
> >>> All the best,
> >>>
> >>> Jenny Barnes
> >>>
> >>> jmb <at> mssl.ucl.ac.uk
> >>
> >>   The problem is that it's not generally a good
> >> idea to data-dredge in this way. Your best bet is
> >> to think about the characteristics of the
> >> data (discrete or continuous, non-negative or real,
> >> symmetric or skewed) and try to narrow it down to
> >> a few distributions -- then you can use fitdistr()
> >> (from the MASS package) or something similar
> >> to compare among them.
> >>
> >>  If you say a little bit more about what
> >> you're trying to do with the data you might
> >> get some more specific advice.
> >>
> >>  Ben Bolker
> >>
> >> ______________________________________________
> >> R-help at r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> >> http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >>
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
>
> -- 
> Thomas E Adams
> National Weather Service
> Ohio River Forecast Center
> 1901 South State Route 134
> Wilmington, OH 45177
>
> EMAIL:    thomas.adams at noaa.gov
>
> VOICE:    937-383-0528
> FAX:    937-383-0033
>



More information about the R-help mailing list