[R] Please ignore earlier mail - [ Poisson - Chi Square test for Goodness of Fit]

saggak saggak1908 at yahoo.co.in
Fri Aug 29 11:25:24 CEST 2008





Dear R-help, 

   

   

Chi Square Test for Goodness of Fit 

   

I have got a discrete data
as given below (R script) 

   

No_of_Frauds<-c(1,1,1,1,1,1,1,1,1,2,1,1,1,1,1,1,2,1,2,2,2,1,1,2,1,1,1,1,4,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,5,1,2,1,1,1,1,1,1,1,3,2,1,1,1,2,1,1,2,1,1,1,1,1,2,1,3,1,2,1,2,14,2,1,1,38,3,3,2,44,1,4,1,4,1,2,2,1,3) 

   

I am trying to fit Poisson
distribution to this data using R. 

   

My R script is as under : 

   

________________________________________________________ 

   

# R SCRIPT for Fitting
Poisson Distribution 

   

No_of_Frauds<-c(1,1,1,1,1,1,1,1,1,2,1,1,1,1,1,1,2,1,2,2,2,1,1,2,1,1,1,1,4,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,5,1,2,1,1,1,1,1,1,1,3,2,1,1,1,2,1,1,2,1,1,1,1,1,2,1,3,1,2,1,2,14,2,1,1,38,3,3,2,44,1,4,1,4,1,2,2,1,3) 

   

N              <-             length(No_of_Frauds) 

   

Average     <-             mean(No_of_Frauds) 

   

Lambda     <-             Average 

   

i               <-             c(0:(N-1)) 

   

pmf           <-             dpois(i, Lambda, log = FALSE) 

   

#
---------------------------------------------------------------------------- 

   

# Ho: The data follow Poisson
Distribution Vs H1: Not Ho 

   

# observed frequencies (Oi) 

   

variable.cnts
      <-     table(No_of_Frauds) 

variable.cnts.prs 
 <-    dpois(as.numeric(names(variable.cnts)),
lambda) 

variable.cnts
      <-     c(variable.cnts, 0) 

   

variable.cnts.prs <-     c(variable.cnts.prs,
1-sum(variable.cnts.prs)) 

tst
                   <-     chisq.test(variable.cnts,
p=variable.cnts.prs) 

   

chi_squared
       <-     as.numeric(unclass(tst)$statistic) 

p_value             <-     as.numeric(unclass(tst)$p.value) 

df
                    <-     tst[2]$parameter 

   

   

cv1                    <-     qchisq(p=.01, df=tst[2]$parameter, lower.tail = FALSE, log.p =
FALSE) 

   

cv2                    <-     qchisq(p=.05, df=tst[2]$parameter, lower.tail = FALSE, log.p =
FALSE) 

   

cv3                    <-     qchisq(p=.1, df=tst[2]$parameter, lower.tail = FALSE, log.p =
FALSE) 

   

#----------------------------------------------------------------------------- 

   

# Expected value 

   

# variable.cnts.prs *
sum(variable.cnts)  

   

   

#
if tst > cv reject Ho at alpha confidence level 

   

#----------------------------------------------------------------------------- 

   

if(chi_squared > cv1) 

   

Conclusion1 <- 'Sample
does not come from the postulated probability distribution at 1% los' else

Conclusion1 <- 'Sample
comes from postulated prob. distribution at 1% los' 

   

   

if(chi_squared > cv2) 

   

Conclusion2 <- 'Sample
does not come from the postulated probability distribution at 5% los' else

Conclusion2 <- 'Sample
comes from postulated prob. distribution at 1% los' 

   

if(chi_squared > cv3) 

Conclusion3 <- 'Sample
does not come from the postulated probability distribution at 10% los' else

Conclusion3 <- 'Sample
come from postulated prob distribution at 1% los' 

   

#----------------------------------------------------------------------------- 

   

# Printing RESULTS  

   

print(chi_squared) 

   

print(p_value) 

   

print(df) 

   

print(cv1) 

   

print(cv2) 

   

print(cv3) 

   

print(Conclusion1) 

   

print(Conclusion2) 

   

print(Conclusion3) 

   

   

##### End of R Script
######## 

   

________________________________________________________ 

   

Problem Faced : 

   

When I run this script using
R – console, 

   

I am getting value of Chi – Square Statistics as
high as “6.95753e+37” 

   

When I did the same calculations in Excel, I got
the Chi Square Statistics value = 138.34.


   

Although it is clear that the sample data doesn’t
follow Poisson distribution, and I will have to look for other discrete
distribution, my problem is the HIGH Value of Chi Square test statistics. When
I analyzed further, I understood the problem.  

   

(A) By convention, if your Expected
frequency is less than 5, then by we put together such classes and form a new
class such that Expected frequency is greater than 5 and also accordingly
adjust the observed frequencies. 

   





  
  X 
  
  
  Oi 
  
  
  Ei 
  
  
  ((Oi - Ei)^2)/Ei 
  


  
  0 
  
  
  0 
  
  
  10 
  
  
  9.96 
  


  
  1 
  
  
  72 
  
  
  23 
  
  
  103.79 
  


  
  2 
  
  
  17 
  
  
  27 
  
  
  3.54 
  


  
  3 
  
  
  5 
  
  
  21 
  
  
  11.85 
  


  
  4 
  
  
  3 
  
  
  12 
  
  
  6.71 
  


  
  5 
  
  
  4 
  
  
  9 
  
  
  2.51 
  


  
  Total 
  
  
  101 
  
  
  101 
  
  
  138.34 
  





   

   

When I apply this logic in Excel, I am getting the
reasonable result (i.e. 138.34), however in Excel also, if I don’t apply this
logic, my Chi square test statistic value is as high as 4.70043E+37. 

   

My
question is how do I modify my R – script, so that the logic mentioned in (A)
i.e. adjusting the Expected frequencies (and accordingly Observed frequencies) is
applied so that the expected frequency becomes greater than 5 for a given
class, thereby resulting in reasonable value of Chi Square test Statistics.

   

I am also attaching the xls file for ready
reference. 

   

I sincerely apologize for taking liberty of writing
such a long mail earnestly request to guide me since I am very new to this “R language” environment. 

   

Thanking in advance for your kind co-operation. 

   

Ashok ( Mumbai ,
 India ) 

   

  

   

   

   

   




help.yahoo.com/l/in/yahoo/mail/yahoomail/tools/tools-08.html/


More information about the R-help mailing list