[R] Extracting Hash-tagged word from Tweets

R. Michael Weylandt michael.weylandt at gmail.com
Wed May 9 05:04:50 CEST 2012


x <- "HollandUKTrade: #Dutch companies striking Olympic gold at London
2012 http://t.co/XsvvXAzT #london2012 #olympics #sport @hollandtrade
@dutchembassyUK"

str_extract_all(pattern =
"#[1234567890ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz]+",
x)

As the documentation for ?regexp says, there are some shortcuts to
avoid listing the whole ASCII set, but they are (unstated) platform
and locale dependent; the way given above should be robust.

Michael

On Tue, May 8, 2012 at 10:24 AM, Adedoyin-Olowe Mariam
<mariamolowe2008 at yahoo.com> wrote:
> Can someone help me with the code I can use to extract word preceded by hash tag in live tweets download from twitteR.
> An example of what I require is:
> [[9]]
> [1] "HollandUKTrade: #Dutch companies striking Olympic gold at London 2012 http://t.co/XsvvXAzT #london2012 #olympics #sport @hollandtrade @dutchembassyUK"  (Tweet download)
>
> I want a code that will extract this:
> #Dutch companies #london2012, #olympics, #sport
>
> I have used the under-listed code in Stringr which return these outputs I did not require:
>> str_extract_all("#<-a-z, #<-A-Z", "[[string1:string10]]") [[1]]
> character(0)
>> str_extract_all("#<-a-z, #<-A-Z", "[[string9]]") [[1]]
> character(0)
>
>> str_extract_all("#=[1:10]", "#+a-z") [[1]]
> character(0)
> str_extract_all("#=[1:10]", "#+") [[1]]
> [1] "#"
>
> Positive help will be highly appreciated.
> Mariam
>        [[alternative HTML version deleted]]
>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list