[BioC] Fetching documents from PubMed

Kaustubh Patil kaustubhp_in at yahoo.com
Wed Feb 22 20:44:40 CET 2006


Hi,
    I forgot to attch the file 
    Its here,
    
    Kaustubh

Kaustubh Patil <kaustubhp_in at yahoo.com> wrote:  Dear Robert,
 
 Thanks for your reply. First of all something about my system,
 
 I have celeron 2.5 with 512 mb ram, running fedora core 4
 R Version 2.2.1  (2005-12-20 r36812) wilth RSXML 0.99
 
 I am attaching a file that contains 2665 PMIDS that I want to fetch, load this file using 
 
 load("ids") 
 
 and it will create a variable with name ids. 
 
 Then if I use following code, I get only 363 abstracts,
 
 docs <- pubmed(ids)
 root <- xmlRoot(docs)
 arts <- xmlApply(root,buildPubMedAbst)
 absts <- sapply(arts,abstText)
 
 length(absts)
 [1] 363
 
  interestingly those are first 363 abstracts. The 364th ("12136003")  abstract could be fetched manually as well as using MedlineR library.
 
 Am I missing something here?

Robert Gentleman <rgentlem at fhcrc.org> wrote: Hi,
  pubmed makes precisely one request, so there is no issue with timing. 
In many cases you can make a single request for lots of things, rather 
than lots of requests for one thing. If you stick it in a for loop then 
there could be problems, but so far not a single person has reported 
hitting this particular wall.

As for why only 377 came back, did you check to see what happens if you 
request one of the missing ones by itself? Or go to the website at NLM 
and see if you Pubmed id is valid?

Also, please do read the posting guide and tell us something about your 
system.

thanks
  Robert


Kaustubh Patil wrote:
> Hi,
>  
>  I want to fetch documents from PubMed. So first I get all the PMIDs and  then use the "pubmed" function from the "annotate package". But does  this function take care of the NCBI rule for waiting 3 seconds between  queries? 
>  
> Also I have a list of 718 PMIDs but the  function retrieves only 377 of them? I don't understand why.  Suggestions appreciated.
>  
>  Thank you and regards,
>  Kaustubh
>  
>   
> ---------------------------------
> 
>  [[alternative HTML version deleted]]
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> 

-- 
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
rgentlem at fhcrc.org

    

---------------------------------


		
---------------------------------


More information about the Bioconductor mailing list