[R] Analyzing Publications from Pubmed via XML

Farrel Buchinsky fjbuch at gmail.com
Fri Dec 14 21:04:22 CET 2007


> The problem is that the RSS feed you linked to, does not contain the
> year of the article in an easily accessible XML element. Rather you
> have to process the HTML content of the description element - which,
> is something R could do, but you'd be using the wrong tool for the job.
>

Yes. I have noticed that there two sorts of xml that pubmed will
provide. The kind I had hooked into was an rss feed which provides a
lot of the information simply as a formatted table for viewing in a
rss reader. There is another way to get the xml to come out with more
tags. However, I found the best way to do this is probably through the
bioconductor annotate package

x <- pubmed("18046565", "17978930", "17975511")
a <- xmlRoot(x)
numAbst <- length(xmlChildren(a))
absts <- list()
for (i in 1:numAbst) {
absts[[i]] <- buildPubMedAbst(a[[i]])
   }

I am now trying to work through that approach to see what I can come up with.
-- 
Farrel Buchinsky



More information about the R-help mailing list