[R] Analyzing Publications from Pubmed via XML

David Winsemius dwinsemius at comcast.net
Sat Dec 15 18:31:37 CET 2007


"Farrel Buchinsky" <fjbuch at gmail.com> wrote in
news:bd93cdad0712141216s23071d27n17d87a487ad06950 at mail.gmail.com: 

> On Dec 13, 2007 11:35 PM, Robert Gentleman <rgentlem at fhcrc.org> wrote:
>> or just try looking in the annotate package from Bioconductor
>>
> 
> Yip. annotate seems to be the most streamlined way to do this.
> 1) How does one turn the list that is created into a dataframe whose
> column names are along the lines of date, title, journal, authors etc

Gabor's example already did that task.

> 2) I have already created a standing search in pubmed using MyNCBI.
> There are many ways I can feed those results to the pubmed() function.
> The most brute force way of doing it is by running the search and
> outputing the data as a UI List and getting that into the pubmed
> brackets. A way that involved more finesse would allow me to create a
> rss feed based on my search and then give the rss feed url to the
> pubmed function. Or perhaps once could just plop the query inside the
> pubmed functions
> pubmed(somefunction("Laryngeal Neoplasms"[MeSH] AND "Papilloma"[MeSH])
> OR ((("recurrence"[TIAB] NOT Medline[SB]) OR "recurrence"[MeSH Terms]
> OR recurrent[Text Word]) AND respiratory[All Fields] AND
> (("papilloma"[TIAB] NOT Medline[SB]) OR "papilloma"[MeSH Terms] OR
> papillomatosis[Text Word])))
> 
> Does "somefunction" exist?

I could not find it. The pubmed function appears to assume that you will 
already have a list of PMIDs. When I set up a function to take an 
arbitrary  PubMed search string (quoted by the user) and return the 
PMIDs, I had success by following Gabor's example:

> pm.srch<- function (){
   srch.stem <-"http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term="
   query <-as.character(scan(file="",what="character"))
   doc <-xmlTreeParse(paste(srch.stem,query,sep=""),isURL = TRUE, 
         useInternalNodes = TRUE)
   sapply(c("//Id"), xpathApply, doc = doc, fun = xmlValue)
     }
> pm.srch()
1: "laryngeal neoplasms[mh]"
2: 
Read 1 item
      //Id      
 [1,] "18042931"
 [2,] "18038886"
 [3,] "17978930"
 [4,] "17974987"
 [5,] "17972507"
 [6,] "17970149"
 [7,] "17967299"
 [8,] "17962724"
 [9,] "17954109"
[10,] "17942038"
[11,] "17940076"
[12,] "17848290"
[13,] "17848288"
[14,] "17848287"
[15,] "17848278"
[16,] "17938330"
[17,] "17938329"
[18,] "17918311"
[19,] "17910347"
[20,] "17908862"

Emboldened by that minor success, I pushed on. Pubmed said your example 
was malformed and I took their suggested modification:
("Laryngeal Neoplasms"[MeSH] AND "Papilloma"[MeSH]) OR (("recurrence"[TIAB] NOT Medline[SB]) OR "recurrence"[MeSH Terms] OR recurrent[Text Word]) AND respiratory[All Fields] AND (("papilloma"[TIAB] NOT Medline[SB]) OR "papilloma"[MeSH Terms] OR papillomatosis[Text Word]) 

That returned 400+ citations, and I put it into a text file.

After quite a bit of hacking (in the sense of ineffective chopping with 
a dull ax), I finally came up with:

pm.srch<- function (){
  srch.stem<-"http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term="
  query<-readLines(con=file.choose())
  query<-gsub("\\\"","",x=query)
  doc<-xmlTreeParse(paste(srch.stem,query,sep=""),isURL = TRUE, 
                     useInternalNodes = TRUE)
  return(sapply(c("//Id"), xpathApply, doc = doc, fun = xmlValue) )
     }

pm.srch()  #choosing the search-file
      //Id      
 [1,] "18046565"
 [2,] "17978930"
 [3,] "17975511"
 [4,] "17935912"
 [5,] "17851940"
 [6,] "17765779"
 [7,] "17688640"
 [8,] "17638782"
 [9,] "17627059"
[10,] "17599582"
[11,] "17589729"
[12,] "17585283"
[13,] "17568846"
[14,] "17560665"
[15,] "17547971"
[16,] "17428551"
[17,] "17419899"
[18,] "17419519"
[19,] "17385606"
[20,] "17366752"

-- 
David Winsemius



More information about the R-help mailing list