[R] Regex with criteria from multiple lines

Jeff Newmiller jdnewmil at dcn.davis.CA.us
Fri Feb 14 14:58:21 CET 2014


You need to use the JSON library or equivalent to solve this problem. I don't understand why you think that having the data in the clipboard prevents you from doing this since that is just another file (but I usually avoid using the clipboard for reproducible analysis anyway).
---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
                                      Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
--------------------------------------------------------------------------- 
Sent from my phone. Please excuse my brevity.

On February 14, 2014 1:29:59 AM PST, Mark Stam <digistam at gmail.com> wrote:
>Hello,
>
>I do data analysis on json data (Twitter). An example of the data:
>
>**********
>"      \"id\": 433662713886429200,"
>"      \"id_str\": \"433662713886429184\","
>"      \"text\": \"Hond vast in water in Bargerveen bij Zwartemeer -
>http://t.co/FqbkOMzYd1 #Zwartemeer #bargerveen #hond #innood\","
>"      \"source\": \"<a
>href=\"https://about.twitter.com/products/tweetdeck\"
>rel=\"nofollow\">TweetDeck</a>\","
>**********
>
>I get the contents of the "text" field like this:
>
>r <- regexpr("^( )*\"text(.*?),$", myjsondata)
>text <- regmatches(myjsondata,r)
>txt <- gsub("\"text\":|\",|\"","",text)
>
>Unfortunately, in json there are more fields with the same name, for
>example:
>
>**********
>"      \"id\": 433662713886429200,"
>"      \"id_str\": \"433662713886429184\","
>"      \"text\": \"Hond vast in water in Bargerveen bij Zwartemeer -
>http://t.co/FqbkOMzYd1 #Zwartemeer #bargerveen #hond #innood\","
>"      \"source\": \"<a
>href=\"https://about.twitter.com/products/tweetdeck\"
>rel=\"nofollow\">TweetDeck</a>\","
>...
>"      \"entities\":  {"
>
>
>"        \"hashtags\":  ["
>
>
>"           {"
>
>
>"            \"text\": \"Zwartemeer\","
>...
>"            \"text\": \"bargerveen\","
>
>
>...
>"            \"text\": \"hond\","
>etc.
>**********
>
>I only want to get the data from the text field between the "id_str"
>and
>the "source" fields. I don't want to have the data from the text fields
>below "hashtags". I do understand regex, but I don't understand how to
>do
>it with the criteria from multiple lines.
>
>I know it's possible to use a Json library in R, but in my case I
>can't,
>because I get the json from raw "clipboard" data.
>
>Thanks !
>
>Mark Stam
>
>	[[alternative HTML version deleted]]
>
>______________________________________________
>R-help at r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list