[R] Website, book, paper, etc. that shows example plots of distributions?

Gabor Grothendieck ggrothendieck at gmail.com
Sun Feb 15 02:41:06 CET 2009


The regression book by John Fox:
http://socserv.mcmaster.ca/jfox/Books/Companion/index.html
has a section on regression diagnostics and everything is done
in R which might make it particularly suitable.

On Sat, Feb 14, 2009 at 6:48 PM, Jason Rupert <jasonkrupert at yahoo.com> wrote:
> Many thanks to Greg L. Snow and David Winsemius for their responses.
>
> First off I can safely say I don't know enough statistics to be dangerous,
> but hopefully I will get to that point:)
>
> Regarding the goal - ultimately I would like to use linear regression
> (constrained for using linear regression at this point) for my data.  I
> thought the requirements for using linear regression was the following (I
> pulled this list from
> www.utexas.edu/courses/schwab/sw318_spring_2004/SolvingProblems/Class27_RegressionNCorrHypoTest.ppt):
>
> The assumptions required for utilizing a regression equation are the same as
> the assumptions for the test of significance of a correlation coefficient.
> Both variables are interval level.
> Both variables are normally distributed.
> The relationship between the two variables is linear.
> The variance of the values of the dependent variable is uniform for all
> values of the independent variable (equality of variance).
>
> Thus, I was going to attempt to (1) identify which distribution my data most
> closely represents, (2) translate my data so that it is normal, and (3) then
> use linear regression on the data.
>
> However, if
> "The assumptions of most regression methods is that the *errors* need to
> have the desired relationship between means and variance, and not that the
> dependent variable be "normal". Many times the apparent non-normality will
> be "explained" or "captured" by the regression model."
>
> Does this mean I can just "do" linear regression without translating my data
> and it will be okay?
>
> Note that I was using "lm" from R to access the errors, however, I had not
> an opportunity to do much analysis of those results to determine if they are
> Gaussian or not.
>
> I guess I am going to try to track down the following documents:
> (1) Statistical Distributions (Paperback)
> by Merran Evans (Author), Nicholas Hastings (Author), Brian Peacock (Author)
> # ISBN-10: 0471371246
> # ISBN-13: 978-0471371243
>
> (2) Regression Modeling Strategies (Hardcover)
> by Frank E. Jr. Harrell (Author)
> # ISBN-10: 0387952322
> # ISBN-13: 978-0387952321
>
> Maybe electronic versions of those documents are available.  My wife is
> already giving me a hard time the volume of books around.
>
> Thank you again for all your feedback and insights.
>
>
> --- On Fri, 2/13/09, David Winsemius <dwinsemius at comcast.net> wrote:
>
> From: David Winsemius <dwinsemius at comcast.net>
> Subject: Re: [R] Website, book, paper, etc. that shows example plots of
> distributions?
> To: jasonkrupert at yahoo.com
> Cc: "Gabor Grothendieck" <ggrothendieck at gmail.com>, R-help at r-project.org
> Date: Friday, February 13, 2009, 9:10 AM
>
> This is probably the right time to issue a warning about the error of making
> transformations on the dependent variable before doing your analysis. The
> classic error that newcomers to statistics commit is to decide that they
> want to
> "make their data normal". The assumptions of most regression methods
> is that the *errors* need to have the desired relationship between means and
> variance, and not that the dependent variable be "normal". Many times
> the apparent non-normality will be "explained" or "captured"
> by the regression model. Other methods of modeling non-linear dependence are
> also available.
>
> I found Harrell's book "Regression Modeling Strategies" to be an
> excellent source for alternatives. My copy of V&R's MASS is only the
> second edition but chapters 5 & 6 in that edition on
>  linear models also had
> examples of using QQ plots on residuals. Checking that text's website I see
> that chapters 6 at least is probably similar. They include the scripts from
> their chapters along with the MASS package (installed as part of the VR
> bundle).
> My copy is entitled "ch06.r" and resides in the scripts subdirectory:
> /Library/Frameworks/R.framework/Versions/2.8/Resources/library/MASS/scripts/ch06.R
>
> --David Winsemius
>
>
> On Feb 13, 2009, at 8:11 AM, Jason Rupert wrote:
>
>> Thank you very much.  Thank you again regarding the suggestion below.  I
> will give that a shot and I guess I've got my work counted out for me.  I
> counted 45 different distributions.
>>
>> Is the best way to get a QQPlot of each, to run through producing a data
> set for each distribution and then using the qqplot function to get a QQplot
> of
> the distribution and then compare it with my data distribution?
>>
>>
>  As you can tell I am not a trained statistician, so any guidance or
> suggested further reading is greatly appreciated.
>>
>> I guess I am pretty sure my data is not a normal distribution due to doing
> some of the empirical "Goodness of Fit" tests and comparing the QQplot
> of my data against the QQPlot of a normal distribution with the same number
> of
> points.  I guess the next step is to figure out which distribution my data
> most
> closely matches.
>>
>> Also, I guess I could also fool around and take the log, sqrt, etc. of my
> data and see if it will then more closely resemble a normal distribution.
>>
>> Thank you again for assisting this novice data analyst who is trying to
> gain a better understanding of the techniques using this powerful software
> package.
>>
>>
>>
>>
>> --- On Fri, 2/13/09, Gabor Grothendieck <ggrothendieck at gmail.com>
> wrote:
>> From: Gabor
>  Grothendieck <ggrothendieck at gmail.com>
>> Subject: Re: [R] Website, book, paper, etc. that shows example plots of
> distributions?
>> To: jasonkrupert at yahoo.com
>> Cc: R-help at r-project.org
>> Date: Friday, February 13, 2009, 5:43 AM
>>
>> You can readily create a dynamic display for using qqplot and similar
> functions
>> in conjunction with either the playwith or TeachingDemos packages.
>>
>> For example, to investigate the effect of the shape parameter in the skew
>> normal distribution on its qqplot relative to the normal distribution:
>>
>>   library(playwith)
>>   library(sn)
>>   playwith(qqnorm(rsn(100, shape = shape)),
>>       parameters = list(shape = seq(-3, 3, .1)))
>>
>> Now move the slider located at the bottom of the window that
>> appears and watch the plot change in response to changing
>> the shape value.
>>
>> You can
>  find more distributions here:
>> http://cran.r-project.org/web/views/Distributions.html
>>
>> On Thu, Feb 12, 2009 at 1:04 PM, Jason Rupert
> <jasonkrupert at yahoo.com>
>> wrote:
>>> By any chance is any one aware of a website, book, paper, etc. or
>> combinations of those sources that show plots of different distributions?
>>>
>>> After reading a pretty good whitepaper I became aware of the benefit
> of I
>> the benefit of doing Q-Q plots and histograms to help assess a
> distribution.
>> The whitepaper is called:
>>> "Univariate Analysis and Normality Test Using SAS, Stata, and
>> SPSS*" , (c) 2002-2008 The Trustees of Indiana University Univariate
>> Analysis and Normality Test: 1, Hun Myoung Park
>>>
>>> Unfortunately the white paper does not provide an extensive amount of
>> example distributions plotted using Q-Q plots and histograms, so I
>  am
> curious if
>> there is a "portfolio"-type  website or other whitepaper shows
>> examples of various types of distributions.
>>>
>>> It would be helpful to see a bunch of Q-Q plots and their associated
>> histograms to get an idea of how the distribution looks in comparison
> against
>> the Gaussian.
>>>
>>> I think seeing the plot really helps.
>>>
>>> Thank you for any insights.
>>>
>>>
>>>
>>>       [[alternative HTML version deleted]]
>>>
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>>
>>
>
>>
>>
>> 	[[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>
>




More information about the R-help mailing list