[R] fitting a gaussian to some x,y data

Fri Aug 25 19:12:19 CEST 2006

Michael Koppelman wrote:

> I apologize if this is redundant. I've been Googling, searching the
> archive and reading the help all morning and I am not getting closer
> to my goal.
> 
> I have a series of data( xi, yi). It is not evenly sampled and it is
> messy (meaning that there is a lot of scatter in the data). I want to
> fit a normal distribution (i.e. a gaussian) to the data in order to
> find the center. (The data has a loose "normal" look to it.) I have
> done this with polynomial fitting in R with nls but want to try it
> with a Gaussian (I am a student astronomer and have not a lot of
> experience in statistics).
> 
> In looking at the fitdistr function, it seems to take as input a
> bunch of x values and it will fit a gaussian to the histogram. That
> is not what I need to do, I want to fit a normal distribution to the
> x,y values and get out the parameters of the fit. I'm fooling with
> nls but it seems to want the data in some other format or something
> because it is complaining about "singular gradient".
> 
> I'm sure this isn't hard and the answer must be out there somewhere  
> but I can't find it. I appreciate any assistance.

	Fitting a distribution simply means estimating
	the parameters of that distribution.

	For a Gaussian distribution this problem is trivial.
	The parameters are the mean vector mu and the
	covariance matrix Sigma.

	The (optimal; maximum-likelihood-adjusted-to-be-unbiased)
	estimates of these are the obvious ones --- the sample
	mean and the sample covariance.  In R you most easily
	get these by

		o cbinding your ``x'' and ``y'' vectors into
		matrix:

			> xy <- cbind(x,y)

		o applying mean() to this matrix:

			> mu.hat <- apply(xy,2,mean)

		o calling upon var():

			> Sigma.hat <- var(xy)

	That is all there is to it.

	If all you really want is an estimate of the ``centre''
	then this estimate is just mu.hat; you don't need to
	``fit a distribution'' at all.

	From your description of the problem there may well be
	other issues --- lack of independence of your data,
	biased sample, outliers, skewness, god knows what. Such
	issues should be dealt with as carefully as possible.
	It may be the case that you should be doing some sort
	of robust estimation.  Or it may be the case that this
	data set is hopeless and should be thrown away and a
	new data set collected in a manner that will actually
	yield some information.

	But ``fitting a distribution'' is *NOT* the issue. 

	I don't mean to be rude, but this question illustrates the
	point that you really shouldn't be *doing* statistics unless
	you *understand* some statistics.  At the very best you'll
	waste a great deal of time.

				cheers,

					Rolf Turner
					rolf at math.unb.ca