[R] help: about lemmatization with treetagger tool

Ranjana Girish ranjanagirish30 at gmail.com
Tue Aug 16 09:16:33 CEST 2016


To do lemmatization in R, I executed code below

library("koRpus")
tagged.results <- treetag(c("run", "ran", "running"), treetagger="manual",
format="obj",
                          TT.tknz=FALSE , lang="en",
                          TT.options=list(path="C:/Program
Files/TreeTagger", preset="en"))
tagged.results at TT.res

and got some error

  >source('D:/Rprograms/lemanew.R')
Assuming 'UTF-8' as encoding for the input file. If the results turn out to
be erroneous, check the file for invalid characters, e.g. em.dashes or
fancy quotes, and/or consider setting 'encoding' manually.
Error in matrix(unlist(strsplit(tagged.text, "\t")), ncol = 3, byrow =
TRUE,  :
'data' must be of a vector type, was 'NULL'
In addition: Warning message:
running command 'C:\WINDOWS\system32\cmd.exe /c type
 C:\Users\SULOCH~1\AppData\Local\Temp\Rtmp2De3bl\tokenize5044ef851b9.txt |
 grep -v '^$' | C:\Program Files\TreeTagger\bin\tree-tagger.exe C:\Program
Files\TreeTagger\lib\english-utf8.par -token -lemma -sgml -pt-with-lemma
-quiet | perl -pe 's\\tV[BDHV]\\tVB\;s\IN\\that\\tIN\;'' had status 255



i am not understanding what this error is, could someone tell what it is??

and please send any other code to do lemmatization ,which will give correct
output

	[[alternative HTML version deleted]]



More information about the R-help mailing list