[R] Newbie question regarding graphing of Princomp object

List account lists at norvelle.org
Sat Jan 15 05:39:00 CET 2005


Greetings,

I am working on a stylometric analysis of some latin texts; one of the  
latest stylometric techniques involves using principal components  
analysis.  Not being a statistician, I can't really fully rely on PCA  
as my primary tool, since I don't really understand the statistics  
behind the PCA technique.  Nevertheless, the ability to use PCA and  
graph the results has been marvelously helpful as a preliminary  
technique to determine what kinds of stylometric variables are worth  
pursuing as indicators of authorship.

For instance, I'm doing the following...  I have a set of data for  
approximately 120 different latin works, about half of which are by St.  
Thomas Aquinas, and the other half are by various other authors in the  
Thomistic tradition, some known and some anonymous.  My data for  
frequencies of prepositions looks like the following:

A,AD,CIRCA,CUM,DE, .... (total of 10 variables)
1,0.00967667222531036,0.0208124884194923,0.00142671854734112,0.004863813 
22957198,0.00758291643505651 ...
2,0.00874917700292081,0.0217315416668508,0.00133005165549453,0.004379007 
27772451,0.00537323193714733 ....
3,0.0064258378627327,0.0280901956627422,0.00178739176045295,0.0043058230 
9573329,0.00821688482105979 ....
4,0.00706850368364528,0.027446604903448,0.000821141574836712,0.004617615 
47172807,0.00812783899774761 ....
5,0.010214039424891,0.015409971157808,0.000745993537614122,0.00584650749 
246416,0.00475787738815518 ....
6,0.00952534711010655,0.0180981595092025,0.00125928317726832,0.005150145 
30190507,0.00447206974491443 ...
.... (and so on for the rest of the 120 works)

The works are numbered such that works 100 and below are by St. Thomas,  
those from 101 to 117 are of dubious authenticity, and those from 118  
to 179 are by other authors.

When I perform a biplot, on the results of the princomp() function, I  
get a nice graph that plots the 120 works on the two principal  
component axes (I've figured out how to get rid of the red arrows  
already).  Given that the data points tend to jumble together, I'd like  
some way to color the different categories of works in the biplot, so  
that data points for works 1-100 are red, those from 101-117 are blue,  
and those from 118 to 179 are green (for instance).

I've included a sample of the output that I'm currently getting, in  
case it's helpful to anybody.  BTW, I am running RAqua (for the Mac),  
version 1.8.1.

Thanks in advance for any help!

-Erik Norvelle
erik (at) norvelle (dot) org
Facultad de Filosofía y Letras
Universidad de Navarra
Pamplona, Navarra, España

-------------- next part --------------
A non-text attachment was scrubbed...
Name: prepositions.pdf
Type: application/pdf
Size: 12639 bytes
Desc: not available
Url : https://stat.ethz.ch/pipermail/r-help/attachments/20050115/3611db92/prepositions.pdf
-------------- next part --------------
  


More information about the R-help mailing list