[R] Which distribution best fits the data?

Jenny Barnes jmb at mssl.ucl.ac.uk
Mon Jun 30 14:10:24 CEST 2008


Thanks Ben - will give MASS package a look and try the boxcox function!

Appreciate your time in answering my question :o)

Jenny

On Mon, 30 Jun 2008, Ben Bolker wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
>
> ~    Much better.
>
> ~  If m is your linear regression model,
>
> ~  * boxcox(m) in the MASS package will look for a power (more or less)
> transformation to normalize the residuals -- see the
> book for more information
>
> ~  * plot(m) will produce plots including a Q-Q plot
> (testing normality) of the residuals
>
> ~  * don't forget to check for autocorrelation in the
> residuals (acf(residuals(m)))
>
> ~  Ben Bolker
>
>
> Jenny Barnes wrote:
> | Hi Ben and R-help communtiy,
> |
> | More specifics:
> |
> | I am using sea-surface temperature (averaged over an area) and also
> | winds (averaged over an area) to use in a linear regression model as
> | predictors for rainfall over a small region of Africa. So I have 1 time
> | series of sea-temp and one timeseries of rainfall (over 36 years -
> | seasonal average) and I have performed the linear regression between the
> | 2. I now want to check if the residuals are normally distributed. If
> | they are not I want an R function that will tell me what distribution
> | they are most similar to - so that I can apply a suitable transformation
> | to make the data normal.....
> |
> | Any more tips now that you have a few more details perhaps? :o)
> |
> | Thanks for your time,
> |
> | Jenny
> |
> | On Mon, 30 Jun 2008, Ben Bolker wrote:
> |
> |> Jenny Barnes <jmb <at> mssl.ucl.ac.uk> writes:
> |>
> |>>
> |>> Dear R-help community,
> |>>
> |>> Does anybody know of a stats function in R that tells you which
> |>> distribution best fits your data? I have tried look through the archives
> |>> but have only found functions that tell you if it's normal or log etc.
> |>> specifically - I am looking for a function that tells you (given a
> |>> timeseries) what the distribution is.
> |>>
> |>> Any help/advice will be greatly appreciated,
> |>>
> |>> All the best,
> |>>
> |>> Jenny Barnes
> |>>
> |>> jmb <at> mssl.ucl.ac.uk
> |>
> |>   The problem is that it's not generally a good
> |> idea to data-dredge in this way. Your best bet is
> |> to think about the characteristics of the
> |> data (discrete or continuous, non-negative or real,
> |> symmetric or skewed) and try to narrow it down to
> |> a few distributions -- then you can use fitdistr()
> |> (from the MASS package) or something similar
> |> to compare among them.
> |>
> |>  If you say a little bit more about what
> |> you're trying to do with the data you might
> |> get some more specific advice.
> |>
> |>  Ben Bolker
> |>
> |> ______________________________________________
> |> R-help at r-project.org mailing list
> |> https://stat.ethz.ch/mailman/listinfo/r-help
> |> PLEASE do read the posting guide
> |> http://www.R-project.org/posting-guide.html
> |> and provide commented, minimal, self-contained, reproducible code.
> |>
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.6 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
>
> iD8DBQFIaM/Bc5UpGjwzenMRAitOAJ4qa01aXSjVyBupzBUuf0x8o/47iwCeKuno
> VElg6gIT01qCPvWmELvm63Y=
> =7cue
> -----END PGP SIGNATURE-----
>



More information about the R-help mailing list