[R] Analyzing Publications from Pubmed via XML

Gabor Grothendieck ggrothendieck at gmail.com
Fri Dec 14 03:42:52 CET 2007


On Dec 13, 2007 9:03 PM, Farrel Buchinsky <fjbuch at gmail.com> wrote:
> I would like to track in which journals articles about a particular disease
> are being published. Creating a pubmed search is trivial. The search
> provides data but obviously not as an R dataframe. I can get the search to
> export the data as an xml feed and the xml package seems to be able to read
> it.
>
> xmlTreeParse("
> http://eutils.ncbi.nlm.nih.gov/entrez/eutils/erss.cgi?rss_guid=0_JYbpsax0ZAAPnOd7nFAX-29fXDpTk5t8M4hx9ytT-
> ",isURL=TRUE)
>
> But getting from there to a dataframe in which one column would be the name
> of the journal and another column would be the year (to keep things simple)
> seems to be beyond my capabilities.
>
> Has anyone ever done this and could you share your script? Are there any
> published examples where the end result is a dataframe.
>
> I guess what I am looking for is an easy and simple way to parse the feed
> and extract the data. Alternatively how does one turn an RSS feed into a CSV
> file?

Try this:

library(XML)
doc <-
xmlTreeParse("http://eutils.ncbi.nlm.nih.gov/entrez/eutils/erss.cgi?rss_guid=0_JYbpsax0ZAAPnOd7nFAX-29fXDpTk5t8M4hx9ytT-",
isURL = TRUE, useInternalNodes = TRUE)
sapply(c("//author", "//category"), xpathApply, doc = doc, fun = xmlValue)



More information about the R-help mailing list