[R] Essay identification

Sun Jun 12 23:43:58 CEST 2005

I assume that you know the usual procedure is to 'score' each essay by a
vector that gives the frequency of occurrence of commonly used (sometimes
adding subject matter specific) words and phrases. This multivariate
response is then fed in as a "training set" into your favorite supervised
learning/classification procedure. R has many of these -- trees, logisic
regression, boosting, Random Forests,svm's,LDA,SOM's (whoops -- that's an
Unsupervised one),  ... . Try
RSiteSearch('Classification',restrict=('functions').

The devil is in the details as to what works best, I believe. With only 78
exemplars in 10 groups, unless there is a lot of separation (disparate
styles that you could probably detect manually) it may be difficult. It also
depends on how large each group is (balance is generally better).

Cheers,
Bert

-----Original Message-----
From: r-help-bounces at stat.math.ethz.ch
[mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Werner Bier
Sent: Sunday, June 12, 2005 12:30 PM
To: r-help at stat.math.ethz.ch
Subject: [R] Essay identification

Hi R-help,

I have a database of 10 students who have written an overall of 78 essays. 
The challenge? I would like to identify who wrote the 79th essay.

Has anybody used R in this context? 

Even if not, would you suggest me which pattern recognition technique I
might possibly apply?

Thanks a lot and regards,
Tom 

---------------------------------

	[[alternative HTML version deleted]]

______________________________________________
R-help at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html