[R] Converting english words to numeric equivalents

Hans-Joerg Bibiko bibiko at eva.mpg.de
Mon Jul 28 12:37:36 CEST 2008


On 28 Jul 2008, at 12:23, Hans-Joerg Bibiko wrote:
> How about this?
>
> unletter <- function(word) {
>  gsub('-64',' ',paste(sprintf("%02d",utf8ToInt(tolower(word)) -  
> 96),collapse=''))
> }
>
> unletter("abc")
> [1] "010203"
>
> unletter("Aw")
> [1] "0123"
>
> unletter("I walk to school")
> [1] "09 23011211 2015 190308151512"

I do not know precisely what do you want to do.

With:
as.double(unlist(strsplit(unletter("I walk to school")," ")))

you will get a numeric vector out of the string.
But this leads to a problem with large words like:

as.double(unlist(strsplit(unletter("schoolschool")," ")))
[1] 1.903082e+23

Thus I would suggest if there's a need to mirror words as numeric  
values and the numeric values haven't a meaning to parse your text in  
beforehand to build a hash (a list) of all distinct words in your text  
and assign a number to each word.
This would end up in a list à la:
words <- ("abc" = 1, "I" = 2, "go" = 3, etc.)

After that you can access these numeric values via:
words['go']
$go
[1] 3

--Hans


More information about the R-help mailing list