[R] The KJV
     (Ted Harding) 
    Ted.Harding at manchester.ac.uk
       
    Sun Feb  7 09:28:34 CET 2010
    
    
  
On 07-Feb-10 01:06:40, Ben Bolker wrote:
> Jim Lemon <jim <at> bitwrit.com.au> writes:
> 
>> 
>> On 02/06/2010 06:57 PM, Charlotte Maia wrote:
>> > Hey all,
>> >
>> > Does anyone know if there are any R packages with a copy of the KJV?
>> > I'm guessing the answer is no...
>> >
>> > So the next question, and the more important one is:
>> > Does anyone think it would be useful (e.g. for text-mining
>> > purposes)?
>> > I know almost nothing about theology,
>> > so I'm not sure what kind of questions theologists might have (that
>> > R
>> > could answer).
>> >
>> > An alternative, that would achieve a similar result (I think),
>> > would be an R interface to another open source system, such as
>> > Sword.
>> >
>> Hi Charlotte,
>> Try
>> 
>> http://www.gutenberg.org/etext/10
>> 
>> Jim
>> 
> 
>  I couldn't help it:
> 
> x <- url("http://www.gutenberg.org/dirs/etext90/kjv10.txt",open="r")
> X <- readLines(x,n=20000)
> z <- grep("First Book of Moses",X)
> X <- X[-(1:z)]
> X <- X[nchar(X)>0]
> length(X) ## 15058
> words <- tolower(unlist(strsplit(X,"[ .,:;()]")))
> words2 <- grep("[^0-9]",words,value=TRUE)
> tt <- rev(sort(table(words2)))
> barplot(rev(tt[1:100]),horiz=TRUE,las=1,cex.names=0.4,log="x")
Delightful! And fascinating in the detail too.
  length(tt)
  # [1] 5078
with slight changes like:
  barplot(rev(tt[1:50]),horiz=TRUE,las=1,cex.names=0.6,log="x")
  # ...
  barplot(rev(tt[101:150]),horiz=TRUE,las=1,cex.names=0.6,log="x")
  # ...
and see the likes of
  tt["lord"]
  # lord 
  # 1939 
  tt["god"]
  # god 
  # 822 
  tt["men"]
  # men 
  # 204 
  tt["women"]
  # women 
  #    26 
I'm now wondering how it matches up with Zipf's Law (or perhaps
Fisher's logarithmic ... )
Thanks, Ben!
Ted.
--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 07-Feb-10                                       Time: 08:28:30
------------------------------ XFMail ------------------------------
    
    
More information about the R-help
mailing list