[R] Testing for strength of fit using R

David Winsemius dwinsemius at comcast.net
Thu Nov 26 17:35:24 CET 2009

On Nov 26, 2009, at 9:48 AM, Steve Murray wrote:

> Dear all,
> I am trying to validate a model by comparing simulated output values  
> against observed values. I have produced a simple X-y scatter plot  
> with a 1:1 line, so that the closer the points fall to this line,  
> the better the 'fit' between the modelled data and the observation  
> data.
> I am now attempting to quantify the strength of this fit by using a  
> statistical test in R. I am no statistics guru, but from my limited  
> understanding, I suspect that I need to use the Chi Squared test (I  
> am more than happy to be corrected on this though!).
> However, this results in the following:
>> chisq.test(data$Simulation,data$Observation)
>     Pearson's Chi-squared test
> data:  data$Simulation and data$Observation
> X-squared = 567, df = 550, p-value = 0.2989
> Warning message:
> In chisq.test(data$Simulation, data$Observation) :
>   Chi-squared approximation may be incorrect
> The ?chisq.test document suggests that the objects should be of  
> vector or matrix format, so I tried the following, but still receive  
> a warning message (and different results):
>> chisq.test(as.matrix(data[,4:5]))
>     Pearson's Chi-squared test
> data:  as.matrix(data[, 4:5])
> X-squared = 130.8284, df = 26, p-value = 6.095e-16

When you look at your "data" you see only 27 cases, so it would be  
implausible that your first invocation with a degree of freedom = 550  
would be giving you something meaningful. The second one might have  
been more meaningful goodness of fit. I cannot explain why code # 1  
did not give the same results since I would have thought that the  
positional matching of R would have resulted in the same results for  
both calls. What happens if you try:

chisq.test(data$Simulation, y=data$Observation)  # ?

All of that being said, chisq.test is primarily intended for  
contingency tables. Testing association between two paired continuous  
variables is usually approached with regression and correlation tests.  


Also may want to look at the Q-Q plot.


David Winsemius

> Warning message:
> In chisq.test(as.matrix(data[, 4:5])) :
>   Chi-squared approximation may be incorrect
> What am I doing wrong and how can I successfully measure how well  
> the simulated values fit the observed values?
> If it's of any help, here are how my data are structured - note that  
> I am only using columns 4 and 5 (Observation and Simulation).
>> str(data)
> 'data.frame':    27 obs. of  5 variables:
>  $ Location        : Factor w/ 27 levels "Australia","Brazil",..: 8  
> 2 13 19 22 14 16 23 6 7 ...
>  $ Vegetation      : Factor w/ 21 levels "Beech","Broadleaf  
> evergreen laurel",..: 17 21 2 16 15 16 9 16 3 4 ...
>  $ Vegetation.Class: Factor w/ 4 levels "Boreal and Temperate  
> Evergreen",..: 3 3 4 1 1 1 4 1 4 1 ...
>  $ Observation     : num  24 8.9 14.7 26.7 42.4 31.7 30.8 7.5 14  
> 22 ...
>  $ Simulation      : num  33.9 7.8 9.74 7.6 11.8 10.7 12 28.1 1.7  
> 1.7 ...
> I hope someone is able to point me in the right direction.
> Many thanks,

David Winsemius, MD
Heritage Laboratories
West Hartford, CT

More information about the R-help mailing list