[BioC] help with PubMed Central OAI

Chris Stubben stubben at lanl.gov
Fri Apr 20 19:33:56 CEST 2012


I've been using Efetch to get some full text articles from Pubmed 
Central,  which works fine...

url <- 
"http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pmc&id=PMC2784878"
x<-readLines(url)
doc <- xmlParse(x )   # requires XML package
xpathSApply(doc, "//abstract", xmlValue)
[1] "The majority of all genes have so far been identified and annotated 
systematically through in silico gene finding. Here we report the 
finding of 3662 strand-specific transcriptionally active regions (TARs) 
in the genome of Bacillus subtilis by the use of tiling arrays.


I recently noticed the PMC copyright says to use the FTP or OAI service 
for any "automated" retrievals, so I thought I would try OAI, but I 
can't get the same xpath queries to work.

url <- 
"http://www.pubmedcentral.nih.gov/oai/oai.cgi?verb=GetRecord&metadataPrefix=pmc&identifier=oai:pubmedcentral.nih.gov:2784878"
x2<-readLines(url)  # will warn about incomplete final line
doc2 <- xmlParse(x2 )
xpathSApply(doc2, "//abstract", xmlValue)
list()

This query does work so I know there's an abstract tag. 
table(xpathSApply(doc2, "//*", xmlName))

              abstract                    ack              
addr-line                    aff                article     
article-categories
                     1                      1                      
1                      1                      1                      1
            article-id           article-meta          
article-title           author-notes                   
back                   body
                     3                      1                     
79                      1                      1                      1
               caption                contrib          contrib-group    
copyright-statement                corresp                   date
                     7                      3                      
1                      1                      1                      1

Thanks for any help.
Chris Stubben



More information about the Bioconductor mailing list