[R] Couple of Questions about Classification trees

Thu Mar 12 14:45:40 CET 2009

The issue with the sample size is that there are so many measurements in 
comparison to number of meats.

Aside from that, you should check out the rpart package.  Its commands 
are similar to the tree package, but there are more options for the 
plots.  I don't know immediately how to display misclassification rates, 
but the text.rpart command can display numbers of incorrectly- and 
correctly-classified observations in each node.

Ed

--
Ed Merkle, PhD
Assistant Professor
Dept. of Psychology
Wichita State University
Wichita, KS, USA 67260

> Date: Wed, 11 Mar 2009 13:53:46 -0700 (PDT)
> From: Jen_mp3 <Jen_mp3 at msn.com>
> Subject: Re: [R] Couple of Questions about Classification trees
> To: r-help at r-project.org
> Message-ID: <22464302.post at talk.nabble.com>
> Content-Type: text/plain; charset=us-ascii
> 
> 
> 
> Okay perhaps I should've been more clear about the data. Im actually working
> on spectroscopic measurements from food authenticity testing. I have five
> different types of meat: 55 of chicken, 55 of turkey, 55 of pork, 34 of beef
> and 32 of lamb - 231 in total. On each of these 231 meats, 1024
> spectroscopic measurements were taken. Matrix of 231 by 1024. But the
> questions I want answered are which of the 1024 measurements are important
> for predicting meat type and which of the different types of meat are
> incorrectly classified - i.e can we tell the difference between chicken and
> turkey. So to carry out a multivariate analysis on the data Ive split it
> into two. A training data set and a test data set - half and half although I
> think the larger half (55 goes into 27 and 28) went into the test data set
> which explains the inequalities in the row numbers. By the way 1024 is
> standard - can't change that. Can't change the 231 either.
> 
> So I created a new row with the meat types for each row.
> 
> End up with the following R code:
> library(tree)
> meat.tree <- tree(meat.type~., data=train)
> using tree.cv (or cv.tree) lowest missclassification rate is 5 so cut the
> number of nodes down to 5 using prune.tree
> prunedtree <- prune.tree(meat.tree, best = 5, method = "misclass")
> Then I want to use predict.tree and the test data set.
> predicttree <- predict.tree(prunedtree, data = test)
> I already said what it produces.
> 
> Again, how would I display the misclassification rate at each node on the
> diagram? I know about misclass.tree(prunedtree, detail = TRUE) but that
> doesn't actually display them on the classification tree - it just gives a
> bunch of numbers of the worksheet and it just wouldn't look very neat if I
> had to add them later.
> 
> --
> View this message in context: http://www.nabble.com/Couple-of-Questions-about-Classification-trees-tp22461673p22464302.html
> Sent from the R help mailing list archive at Nabble.com.