[R] Problem with Extracting Hash Tagged Words from Tweets

R. Michael Weylandt michael.weylandt at gmail.com
Tue May 22 17:08:20 CEST 2012


"The presence of these numbers in square brackets is reporting error."

You mean the square brackets that show up on the left hand side when
you do something like

x <- 1:100
print(x)

?

Don't worry -- those aren't part of x -- they're just added on
printing to make things easier for the user to see where he is in the
vector. They won't be included in any analysis. If you need control
over the printing to avoid them, take a look at cat()

Michel

On Tue, May 22, 2012 at 11:02 AM, Sarah Goslee <sarah.goslee at gmail.com> wrote:
> Hi,
>
> On Tue, May 22, 2012 at 10:55 AM, Adedoyin-Olowe Mariam
> <mariamolowe2008 at yahoo.com> wrote:
>> Hi Sarah,
>>
>> Thanks for your help. I'm sorry my question is not clear enough.
>> Maybe what I should ask for is how to remove the downloaded
>> tweet numbers in
>> x <- list
>> (ie.[[1]], [1], [[2]], [2].....)
>> before > sapply(x, str_extract_all, "#\\<.*?\\>").
>
> Those aren't part of the tweets. Those are the numbers R uses when
> displaying portions of a list.
>
>> The presence of these numbers in square brackets is reporting error.




>
> What error? You'll need to give us an actual reproducible example,
> since what you are describing is unclear.
>
> Although I suppose it's possible that you simply want:
>> unlist(sapply(x, str_extract_all, "#\\<.*?\\>"))
> [1] "#dayatthenews" "#pompeyhacks"  "#portsmouth"   "#southsea"
> [5] "#Portsmouth"   "#portsmouth"
>
> It's impossible for me to tell precisely what the problem is.
>
> Sarah
>
>>
>> Thanks.
>> Mariam
>>
>>
>> ________________________________
>> From: Sarah Goslee <sarah.goslee at gmail.com>
>> To: Adedoyin-Olowe Mariam <mariamolowe2008 at yahoo.com>
>> Cc: "r-help at r-project.org" <r-help at r-project.org>
>> Sent: Tuesday, 22 May 2012, 13:53
>> Subject: Re: [R] Problem with Extracting Hash Tagged Words from Tweets
>>
>> Hi,
>>
>> A small reproducible bit of your data would have been nice, and I have
>> no idea what "manually remove all regular expressions" might mean, but
>> take a look at this:
>>
>> x <- list("marymaryw: Get an insight into how journalists operate at
>> The News by following #dayatthenews today #pompeyhacks #portsmouth
>> #southsea", "VouchAR_Ports: £5 instead of £60 for 1 month of unlimited
>> fitness classes at Outdoor Fitness Leeds - get bikini...
>> http://t.co/BUrkjtCh #Portsmouth", "BillieRaePhoto: RT @vintagesecret:
>> My dad has just sent me this picture. Looks like @GunwharfQuays is on
>> fire?! #portsmouth http://t.co/HbAV7Hw0")
>>
>>> sapply(x, str_extract_all, "#\\<.*?\\>")
>> [[1]]
>> [1] "#dayatthenews" "#pompeyhacks"  "#portsmouth"  "#southsea"
>>
>> [[2]]
>> [1] "#Portsmouth"
>>
>> [[3]]
>> [1] "#portsmouth"
>>
>> Sarah
>>
>> On Tue, May 22, 2012 at 7:00 AM, Adedoyin-Olowe Mariam
>> <mariamolowe2008 at yahoo.com> wrote:
>>> Hello All,
>>> Can anyone help me solve this problem.
>>> Am trying to extract hash-tagged words from tweets downloaded from
>>> twitteR.
>>>
>>> I can extract hash-tagged words from single tweet using
>>> (stringr) str_extract_all(tweets, "#[a-z//A-Z//0-9]+")
>>> but cannot with more than one tweet at a time except I manually remove all
>>> regular expressions and tweets numbers such as [[1]] and [1.]
>>>
>>> I want to automatically extract all #words in large number of tweets at a
>>> go.
>>> This is what I have done so far by removing all regular expressions
>>> manually:
>>>
>>>> searchTwitter("#Portsmouth", n=20) [[1]]
>>> [1] "marymaryw: Get an insight into how journalists operate at The News by
>>> following #dayatthenews today #pompeyhacks #portsmouth #southsea"
>>> [[2]]
>>> [1] "VouchAR_Ports: £5 instead of £60 for 1 month of unlimited fitness
>>> classes at Outdoor Fitness Leeds - get bikini... http://t.co/BUrkjtCh
>>> #Portsmouth"
>>> [[3]]
>>> [1] "BillieRaePhoto: RT @vintagesecret: My dad has just sent me this
>>> picture. Looks like @GunwharfQuays is on fire?! #portsmouth
>>> http://t.co/HbAV7Hw0"
>>> [[4]]
>>> [1] "xangma: RT @vintagesecret: My dad has just sent me this picture.
>>> Looks like @GunwharfQuays is on fire?! #portsmouth http://t.co/HbAV7Hw0"
>>> [[5]]
>>> [1] "vintagesecret: My dad has just sent me this picture. Looks like
>>> @GunwharfQuays is on fire?! #portsmouth http://t.co/HbAV7Hw0"
>>> [[6]]
>>> [1] "i_amnik: RT @BBCRadioSolent: Can you see the #GunwharfQuays fire?
>>> Eye-witnesses please call - 0845 30 30 961. #Portsmouth."
>>> [[7]]
>>> [1] "vickiredmond: RT @dan_germain: RT @MatMacAulay: Best pic of #Gunwharf
>>> on fire I have seen http://t.co/8LNAiqiD #portsmouth"
>>> [[8]]
>>> [1] "EmilieRosa: Highs of 25 degrees on the island this week!! Beach time
>>> after exams I think! ;) #Portsmouth"
>>> [[9]]
>>> [1] "MrYiff: RT @dan_germain: RT @MatMacAulay: Best pic of #Gunwharf on
>>> fire I have seen http://t.co/8LNAiqiD #portsmouth"
>>> [[10]]
>>> [1] "otbsaad: RT @BBCRadioSolent: BREAKING NEWS - Reports of a large fire
>>> at #GunwharfQuays in #Portsmouth. Latest updates on @BBCRadioSolent 96.1FM"
>>> [[11]]
>>> [1] "PN_Newsdesk: #Portsmouth: Ferryspeed looks to build on its past
>>> successes http://t.co/CmDglDkg"
>>> [[12]]
>>> [1] "PN_Newsdesk: #Portsmouth: More room for stalls at top Southsea school
>>> - A SOUTHSEA primary school still has room for people to se...
>>> http://t.co/ucbYWjPR"
>>> [[13]]
>>> [1] "VouchAR_Ports: £14 instead of £30 for a pedicure with foiled transfer
>>> at Forever Young, Stoke-on-Trent - get... http://t.co/P7gJBcl8 #Portsmouth"
>>> [[14]]
>>> [1] "TelArnott: Looking forward to #K1 today! #gym01 #portsmouth"
>>> [[15]]
>>> [1] "dan_germain: RT @MatMacAulay: Best pic of #Gunwharf on fire I have
>>> seen http://t.co/8LNAiqiD #portsmouth"
>>> [[16]]
>>> [1] "dan_germain: RT @portsmouthnews: News: Large fire at Gunwharf Quays -
>>> http://t.co/s9RWpY0i #portsmouth #southsea"
>>> [[17]]
>>> [1] "i_amnik: RT @BBCRadioSolent: BREAKING NEWS - Reports of a large fire
>>> at #GunwharfQuays in #Portsmouth. Latest updates on @BBCRadioSolent 96.1FM"
>>> [[18]]
>>> [1] "solentmotorcars: RT @BBCRadioSolent: BREAKING NEWS - Reports of a
>>> large fire at #GunwharfQuays in #Portsmouth. Latest updates on
>>> @BBCRadioSolent 96.1FM"
>>> [[19]]
>>> [1] "HantsChiefAlex: RT @BBCRadioSolent: BREAKING NEWS - Reports of a
>>> large fire at #GunwharfQuays in #Portsmouth. Latest updates on
>>> @BBCRadioSolent 96.1FM"
>>> [[20]]
>>> [1] "BBCRadioSolent: Can you see the #GunwharfQuays fire? Eye-witnesses
>>> please call - 0845 30 30 961. #Portsmouth."
>>>> tweets <-c("marymaryw: Get an insight into how journalists operate at The
>>>> News by following #dayatthenews today #pompeyhacks #portsmouth #southsea
>>>> VouchAR_Ports £5 instead of £60 for 1 month of unlimited fitness classes at
>>>> Outdoor Fitness Leeds - get bikini... http://t.co/BUrkjtCh #Portsmouth
>>>> BillieRaePhoto RT @vintagesecret My dad has just sent me this picture. Looks
>>>> like @GunwharfQuays is on fire?! #portsmouth http://t.co/HbAV7Hw0 xangma: RT
>>>> @vintagesecret My dad has just sent me this picture. Looks like
>>>> @GunwharfQuays is on fire?! #portsmouth http://t.co/HbAV7Hw0 vintagesecret
>>>> My dad has just sent me this picture. Looks like @GunwharfQuays is on fire?!
>>>> #portsmouth http://t.co/HbAV7Hw0iamnik: RT @BBCRadioSolent Can you see the
>>>> #GunwharfQuays fire? Eye-witnesses please call - 0845 30 30 961.
>>>> #Portsmouth. vickiredmond @MatMacAulay Best pic of#Gunwharf on fire I have
>>>> seen http://t.co/8LNAiqiD #portsmouth EmilieRosa: Highs of 25 degrees on the
>>>> island
>>>  this week!! Beach time after exams I think!) #Portsmouth mYiff RT
>>> @dan_germain: RT @MatMacAulay Best pic of #Gunwharf on fire I have seen
>>> http://t.co/8LNAiqiD #portsmouth otbsaad RT @BBCRadioSolent: BREAKING NEWS -
>>> Reports of a large fire at #GunwharfQuays in #Portsmouth. Latest updates on
>>> @BBCRadioSolent 96.1FM PN_Newsdesk #Portsmouth: Ferryspeed looks to build on
>>> its past successes http://t.co/CmDglDkg PN_Newsdesk #Portsmouth More room
>>> for stalls at top Southsea school - A SOUTHSEA primary school still has room
>>> for people to se... http://t.co/ucbYWjPR VouchAR_Ports £14 instead of £30
>>> for a pedicure with foiled transfer at Forever Young, Stoke-on-Trent -
>>> get... http://t.co/P7gJBcl8 #Portsmouth TelArnott Looking forward to #K1
>>> today! #gym01 #portsmouth Best pic of #Gunwharf on fire I have seen
>>> http://t.co/8LNAiqiD #portsmouth dangermain RT @portsmouthnews News Large
>>> fire at Gunwharf Quays - http://t.co/s9RWpY0i #portsmouth #southsea iamnik
>>> RT
>>>  @BBCRadioSolent BREAKING NEWS - Reports of a large fire at #GunwharfQuays
>>> in #Portsmouth. Latest updates on @BBCRadioSolent 96.1FM solentmotorcars RT
>>> @BBCRadioSolent: BREAKING NEWS - Reports of a large fire at #GunwharfQuays
>>> in #Portsmouth. Latest updates on @BBCRadioSolent 96.1FM HantsChiefAlex RT
>>> @BBCRadioSolent BREAKING NEWS - Reports of a large fire at #GunwharfQuays in
>>> #Portsmouth. Latest updates on @BBCRadioSolent 96.1FM BBCRadioSolent Can you
>>> see the #GunwharfQuays fire? Eye-witnesses please call - 0845 30 30 961.
>>> #Portsmouth")
>>>> str_extract_all(tweets, "#[a-z//A-Z//0-9]+")
>>> [[1]]
>>>  [1] "#dayatthenews"  "#pompeyhacks"   "#portsmouth"    "#southsea"
>>>  "#Portsmouth"    "#portsmouth"    "#portsmouth"
>>>  [8] "#portsmouth"    "#GunwharfQuays" "#Portsmouth"    "#Gunwharf"
>>>  "#portsmouth"    "#Portsmouth"    "#Gunwharf"
>>> [15] "#portsmouth"    "#GunwharfQuays" "#Portsmouth"    "#Portsmouth"
>>>  "#Portsmouth"    "#Portsmouth"    "#K1"
>>> [22] "#gym01"         "#portsmouth"    "#Gunwharf"      "#portsmouth"
>>>  "#portsmouth"    "#southsea"      "#GunwharfQuays"
>>> [29] "#Portsmouth"    "#GunwharfQuays" "#Portsmouth"    "#GunwharfQuays"
>>> "#Portsmouth"    "#GunwharfQuays" "#Portsmouth"
>>>
>>> Please I need help.
>>>
>>> Mariam
>>
>
>
>
> --
> Sarah Goslee
> http://www.functionaldiversity.org
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list