[R] Least square minimization (non-linear)

Liaw, Andy andy_liaw at merck.com
Sun Jan 9 04:25:34 CET 2005


I think the question is fairly clear (to me, at least).  My problem is
`Why?'

If I'm not mistaken, what Choudary is asked to do is fit a gaussian density
to the data, by fitting the gaussian pdf to the (x, y) data where x are the
midpoints of the bins and y are the heights of the histogram, via nonlinear
least squares.  The fitted distribution is, of course, guaranteed to be a
real density, as it's a gaussian pdf with parameters estimated from NLS.

What Choudary (and his colleague) may not realize is that that's about as
convoluted a way of estimating the  parameters as one can imagine (or
perhaps beyond imagination?).  I do not see any advantage of doing things
this way over just estimating the parameters by the sample mean and variance
(or perhaps the MLE).  At least the statistical properties are well known
(and optimal in certain sense).  

If one is going to fit a gaussian distribution, just do it directly.
There's no need to go half way around the world to do that.  If you are
going to use the histogram, how do you decide on how many bins to use, and
where the boundaries of the bins should be?  Even with a fix number of bins
and bin width, there's not a unique histogram for a set of data.  Which one
should you use?  How do you justify these choices?

If the goal is _not_ to fit a gaussian distribution to the data, then please
do explain what it is.  If by `plotting experimental values vs. theoretical
values' you are trying to assess the normality of the data, then the Q-Q
plot (qqnorm() as Spencer suggested) is a far better choice.

Andy

> From: Spencer Graves
> 
>       What are you trying to accomplish? 
> 
>       If you want to model the distribution of the data, and if the 
> numbers are plausibly normally distributed, I'd start with 
> "qqnorm".  If 
> you have some distribution in mind, I'd try "fitdistr" in 
> library(MASS).  If it's neither of these, please make another 
> attempt to 
> "read the posting guide! 
> http://www.R-project.org/posting-guide.html".  
> Your question 
> may seem perfectly clear to you, but it seems to me to be 
> too general to answer;  carefully following the instructions in the 
> posting guide can increase the likelihood that someone will 
> understand 
> what you are trying to do enough to actually help you. 
> 
>       hope this helps.
>       spencer graves
> 
> Jagarlamudi, Choudary wrote:
> 
> >Hi all,
> >I think the last time i posted this topic i started on the 
> wrong foot. Thnaks alot to everyone who responded.
> >i'm coding in  R(first time) for a paper my colleague is 
> publishing.  i plotted a histogram for 6000 values.  
> >I am told to plot experimental vs theoretical vlaues from 
> the  histogram and  do a non linear least square curve 
> minimization and compute the mean and sd of the new  x 
> values.Excuse me if this sounds too  naive.Can you help me in 
> getting a start on this one.
> >
> >Choudary Jagarlamudi
> >Instructor
> >Southwestern Oklahoma State University
> >STF 254
> >100 campus Drive
> >Weatherford OK 73096
> >Tel 580-774-7136
> >
> >	[[alternative HTML version deleted]]
> >
> >______________________________________________
> >R-help at stat.math.ethz.ch mailing list
> >https://stat.ethz.ch/mailman/listinfo/r-help
> >PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
> >  
> >
> 
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
> 
>




More information about the R-help mailing list