[R] Grabbing Specific Words from Content (basic text mining)

arun smartpink111 at yahoo.com
Mon Jan 14 15:49:30 CET 2013


YOu could do either:
Lines<-readLines(textConnection("Name: John Smith Age: 35 Address: 32, street, sub, something
Name Adam Grey Age: 25 Address: 26, street, sub, something"))   
 Name<-gsub("Name\\: (.*) Age\\: (.*) Address\\: (.*)","\\1",Lines)
 age<-gsub("Name\\: (.*) Age\\: (.*) Address\\: (.*)","\\2",Lines)
 Address<-gsub("Name\\: (.*) Age\\: (.*) Address\\: (.*)","\\3",Lines)
 #       Name age                    Address
#1 John Smith  35 32, street, sub, something
#2  Adam Grey  25 26, street, sub, something

res[sapply(res,is.character)]<-do.call(cbind,lapply(res[sapply(res,is.character)],function(x) sub("^[[:space:]]*(.*?)[[:space:]]*$","\\1",x)))
#'data.frame':    2 obs. of  3 variables:
# $ V2: chr  "John Smith" "Adam Grey"
# $ V3: num  35 25
# $ V4: chr  "32, street, sub, something" "26, street, sub, something"

----- Original Message -----
From: Sachinthaka Abeywardana <sachin.abeywardana at gmail.com>
To: "r-help at r-project.org" <r-help at r-project.org>
Sent: Monday, January 14, 2013 4:30 AM
Subject: [R] Grabbing Specific Words from Content (basic text mining)

Hi all,

Suppose I have a data frame with mixed content (name age and address).

a<-"Name: John Smith Age: 35 Address: 32, street, sub, something"

1. The question is I want to extract the name age and
address separately from this data frame (containing potentially more

2. Also just incase I have to deal with it how would the syntax change if I
had "Name" as opposed to "Name:" (without the colon).

Any thoughts are much appreciated.


    [[alternative HTML version deleted]]

R-help at r-project.org mailing list
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

More information about the R-help mailing list