[R] Analyzing Publications from Pubmed via XML

David Winsemius dwinsemius at comcast.net
Tue Dec 18 01:53:09 CET 2007


"Armin Goralczyk" <agoralczyk at gmail.com> wrote in
news:a695fbee0712171238g4995040x579e58f52f83376e at mail.gmail.com: 

> On Dec 15, 2007 6:31 PM, David Winsemius <dwinsemius at comcast.net>
> wrote: 
>> > pm.srch<- function (){
>>    srch.stem
>>    <-"http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pub
>>    med&term=" query <-as.character(scan(file="",what="character"))
>>    doc <-xmlTreeParse(paste(srch.stem,query,sep=""),isURL = TRUE,
>>          useInternalNodes = TRUE)
>>    sapply(c("//Id"), xpathApply, doc = doc, fun = xmlValue)
>>      }
>> > pm.srch()
>> 1: "laryngeal neoplasms[mh]"
>> 2:
>> Read 1 item
>>       //Id
>>  [1,] "18042931"
snipped list of IDs
>>
>>
> I tried the above function with simple search terms and it worked fine
> for me (also more output thanks to Martin's post) but when I use
> search terms attributed to certain fields, i.e. with [au] or [ta], I
> get the following error message:
>> pm.srch()
> 1: "laryngeal neoplasms[mh]"
> 2:
> Read 1 item
> Fehler in .Call("RS_XML_ParseTree", as.character(file), handlers,
> as.logical(ignoreBlanks),  :
>   error in creating parser for
> http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&ter
> m=laryngeal neoplasms[mh]
> I/O warning : failed to load external entity
> "http%3A//eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi%3Fdb=pubme
> d&term=laryngeal%20neoplasms%5Bmh%5D" 
>>
> What's wrong?

I'm not sure. You included my simple example. rather than your search string 
that provoked an error. This is an example search that one can find on 
the how-to page for literature searches with /esearch:

http://www.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=PNAS[ta]+AND+97[vi]&retstart=6&retmax=6&tool=biomed3

I am wondering if you used spaces, rather than "+"'s? If so then you may 
want your function to do more gsub-processing of the input string.

When I use the search terms in NCBI's example I get:

> pm.srch<- function (){
+    srch.stem<-"http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term="
+              query<-as.character(scan(file="",what="character"))
+              doc<-xmlTreeParse(paste(srch.stem,query,sep=""),isURL = TRUE, useInternalNodes = TRUE)
+              sapply(c("//Id"), xpathApply, doc = doc, fun = xmlValue)
+      }
> doc.xml<-pm.srch()
1: "PNAS[ta]+AND+97[vi]"
2: 
Read 1 item
> doc.xml
      //Id      
 [1,] "16578858"
 [2,] "11186225"
 [3,] "11121081"
 [4,] "11121080"
 [5,] "11121079"
 [6,] "11121078"
 [7,] "11121077"
 [8,] "11121076"
 [9,] "11121075"
[10,] "11121074"
[11,] "11121073"
[12,] "11121072"
[13,] "11121071"
[14,] "11121070"
[15,] "11121069"
[16,] "11121068"
[17,] "11121067"
[18,] "11121066"
[19,] "11121065"
[20,] "11121064"


-- 
David Winsemius, MD


> Thanks for any help
> -- 
> Armin Goralczyk, M.D.



More information about the R-help mailing list