[R] chisq.test on samples of different lengths

peter dalgaard pdalgd at gmail.com
Tue Aug 24 16:47:52 CEST 2010


On Aug 24, 2010, at 4:12 PM, Marino Taussig De Bodonia, Agnese wrote:

> Hello,
> 
> I am trying to see whether there has been a significant difference in whether people experienced damages from wildlife in two different years. I therefore have two columns:
> 
> year 1:
> yes
> no
> no
> no
> yes
> yes
> no
> 
> year 2:
> no
> yes
> no
> yes
> 
> I wanted to do a chisq.test, but if I enter it this way:
> 
> chisq.test(year1, year2)
> 
> I get the error saying the columns are two different lengths. So then I tried doing:
> 
> damages<-matrix(c(3,4, 2,2), ncol=2, dimnames=list(answer=c("yes", "no"), year=c("year1", year2)))
> chisq.test(damages)
> 
> Does that make sense? Should I maybe be doing a different test instead?

The procedure is fine as such. A more automated way would be to 

mat <- cbind(table(year1),table(year2))
chisq.test(mat)

(some may prefer rbind(...), but the chi-square won't care)

The issue with the two-variable format is that it  expects cross-classifying factors of the same individuals, not two independent groups. So you might do

answer <- c(year1,year2)
year <- rep(1:2, length(year1),length(year2))
table(answer, year) # just for enlightenment
chisq.test(answer, year)


Another matter is that you are below the usual rule of thumb for chi-square: expected >5 obs in all 4 cells, which is obviously not going to happen with 10 observations in total. fisher.test is an option, but you need pretty extreme configurations to obtain significance.

(BTW, all of the above assumes that there are no empty cells. Caveat emptor.)

> 
> Any help would be appreciated, thank you.
> 
> Agnese
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Peter Dalgaard
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com



More information about the R-help mailing list