[R] statistical test for comparison of two classifications (nominal)

Martin Tomko martin.tomko at geo.uzh.ch
Wed Nov 17 16:01:33 CET 2010


Thanks Mat,
I have in the meantime identified the Rand index, but not the others. I 
will also have a look at profdpm, that did not pop-up in my searches.
Indeed, the interpretation is going to be critical... Could you please 
elaborate on what you mean by the bootstrap process?

Thanks a lot for your helps,
Martin

On 11/17/2010 3:50 PM, Matt Shotwell wrote:
> There are several statistics used to compare nominal classifications, or
> _partitions_ of a data set. A partition isn't quite the same in this
> context because partitioned data are not restricted to a fixed number of
> classes. However, the statistics used to compare partitions should also
> work for these 'restricted' partitions. See the Rand index, Fowlkes and
> Mallows index, Wallace indices, and the Jaccard index. The profdpm
> package implements a function (?profdpm::pci) that computes these
> indices for two factors representing partitions of the same data.
>
> The difficult part is drawing statistical inference about these indices.
> It's difficult to formulate a null hypothesis, and even more difficult
> to determine a null distribution for a partition comparison index. A
> bootstrap test might work, but you will probably have to implement this
> yourself.
>
> -Matt
>
> On Wed, 2010-11-17 at 08:33 -0500, Martin Tomko wrote:
>    
>> Dear all,
>> I am having a hard time to figure out a suitable test for the match
>> between two nominal classifications of the same set of data.
>> I have used hierarchical clustering with multiple methods (ward,
>> k-means,...) to classify my dat into a set number of classesa, and I
>> would like to compare the resulting automated classification with the
>> actual - objective benchmark one.
>> So in principle I have a data frame with n columns of nominal
>> classifications, and I want to do a mutual comparison and test for
>> significance in difference in classification between pairs of columns.
>>
>> I just need to identify a suitable test, but I fail. I am currently
>> exploring the possibility of using Cohen's Kappa, but I am open to other
>> suggestions. Especially the fact that kappa seems to be moslty used on
>> failible, human annotators seems to bring in limitations taht do not
>> apply to my automatic classification.
>> Any help will be appreciated, especially if also followed by a pointer
>> to an R package that implements it.
>>
>> Thanks
>> Martin
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>      
>    


-- 
Martin Tomko
Postdoctoral Research Assistant

Geographic Information Systems Division
Department of Geography
University of Zurich - Irchel
Winterthurerstr. 190
CH-8057 Zurich, Switzerland

email: 	martin.tomko at geo.uzh.ch
site:	http://www.geo.uzh.ch/~mtomko
mob: 	+41-788 629 558
tel: 	+41-44-6355256
fax: 	+41-44-6356848



More information about the R-help mailing list