[R] Calculating distance between words in string

Jim Lemon drjimlemon at gmail.com
Wed Nov 11 23:02:43 CET 2015


Perhaps what you are seeking is a sparse distance matrix.

"How far is each word from every other matching word"

sentence<-"How far is each word from every other matching word"
words<-tolower(unlist(strsplit(sentence," ")))
nwords<-length(words)
wdm<-matrix(NA,nrow=nwords,ncol=nwords)
for(word in 1:nwords) {
 wordmatch<-grep(words[word],words,fixed=TRUE)
 wdm[word,wordmatch]<-wordmatch-word
}
rownames(wdm)<-colnames(wdm)<-words
wdm

The result contains zeros for a self-match, relative positions for the
desired matches and NA for non-matches.

Jim



On Thu, Nov 12, 2015 at 12:15 AM, S Ellison <S.Ellison at lgcgroup.com> wrote:

> > -----Original Message-----
> > From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Karl
> > Subject: [R] Calculating distance between words in string
> >
> > .. given a specific keyword, I need to assign labels to the other words
> > based on the distance (number of words) to this keyword.
> >
> >...
> > If the sentence contains more than one instance of the keyword, I need
> values
> > for each instance.
>
> What would you like to happen when the sentence contains more than one
> instance of other words and more than one instance of both?
>
> e.g. what output do you want from
> " amet is not the only instance of 'amet', and there is more than one
> instance of 'instance', 'is', 'of' and 'and'."
>
>
> S Ellison
>
>
> *******************************************************************
> This email and any attachments are confidential. Any u...{{dropped:13}}



More information about the R-help mailing list