# [R] Proportion test in three-chices experiment

Jonathan Baron baron at psych.upenn.edu
Sun Jul 17 21:05:52 CEST 2005

```On 07/17/05 20:12, Rafael Laboissiere wrote:

> Thanks for your reply, Jonathan.  Thanks also to Spencer, who suggested
> using the BTm function.  I realize that my description of both the
> experiment and the involved issue was not clear.  Let me try again:
>
> My subjects do a recognition task where I present stimuli belonging to
> three different classes (let us say A, B, and C).  There are many of
> them.  Subjects are asked to recognize each stimulus as belonging to one
> of the three classes (forced-choice design).  This is done under two
> different conditions (say conditions 1 and 2).  I end up with matrices of
> counts like this (in R notation):
>
> # under condition 1
> c1 <- t (matrix (c (c1AA, c1AB, c1AC,
>                     c1BA, c1BB, c1BC,
> 		    c1CA, c1CB, c1CC), nc = 3))
> # under condition 2
> c2 <- t (matrix (c (c2AA, c2AB, c2AC,
>                     c2BA, c2BB, c2BC,
> 		    c2CA, c2CB, c2CC), nc = 3))
>
> where "cijk" is the number of times the subject gave answer k when
> presented with a stimulus of class j, under condition i.
>
> The issue is to test whether subjects perform better (in the sense of a
> higher recognition score) in condition 1 compared with condition 2.  My
> first idea was to test the global recognition rate, which could be
> computed as:
>
> # under condition 1
> r1 <- sum (diag (c1)) / sum (c1)
> # under condition 2
> r2 <- sum (diag (c2)) / sum (c2)
>
> The null hypothesis is that r1 is not different from r2. I guess that I
> could test it with the chisq.test function, like this:
>
> p1 <- sum (diag (c1))
> q1 <- sum (c1) - p1
> p2 <- sum (diag (c2))
> q2 <- sum (c2) - p2
> chisq.test (matrix (c(p1, q1, p2, q2), nc = 2))
>
> What do you think?
>
> I also thought about testing the triples like [c1AA, c1AB, c1AC] against
> [c2AA, c2AB, c2AC], hence my original question.

You still aren't saying whether you are doing this for each
subject for the entire data set summed over subjects.  If the
latter, are you worried about subject variance?  Do you think it
possible that some subjects might show better performance in
condition 2?  Would you be happy if you tested a single subject
and got the result?  If subject variance is an issue, then you
need to test "across subjects."  One way to do that is to
compute some performance measure for each subject and each
condition and then do a matched-pairs t test across subjects.

The method you suggest requires several assumptions, and I don't
know if these are reasonable.  The problem is in using a sum of
the diagonal (p1) and off-diagonal entries (q1) in the table.
This may work if you have no reason to think that c2 is better,
ever.  In that case, all you need is a measure that varies
monotonically with the true measure, whatever it is.  You need
also to assume that c1 and c2 do not differ in response biases,
and that it could not be the case that one of the diagonal cells
is better in c1 and another is better in c2.

I have not studied these issues much since my PhD thesis (1970!),
but then the usual approach was to develop a sensible model of
the task and then use some parameter of the model as the
measure.  Perhaps this is over-kill for what you are doing, but I
don't know.  For example, one model says that the subject either
knows the answer or guesses, and the guesses are distributed
across the three categories according to biases that are specific
to the condition, but knowing the answer is independent of the
category.  (You can test the assumptions of this model.)  Another
model (popular in 1970) is Luce's choice theory, which is similar
to the first but uses multiplication.  If I remember correctly
(which I probably don't) you would exactly what you propose but
after taking the logs of the frequencies.

It is possible to get different, even opposite, results using
logs than you would get with your proposal.  Likewise, it is
possible to get opposite results if you ignore response bias, and
if the conditions differ in response bias.

The suggestion I made based on the idea of inter-rater agreement
implies a rough-and-ready model similar to the first.  It does
take response bias into account.

Jon
--
Jonathan Baron, Professor of Psychology, University of Pennsylvania