[R] Pubmed (XML) data to data.frame

Marc Marí Dell'Olmo marceivissa at gmail.com
Wed Feb 5 01:35:01 CET 2014


Dear all,

I would like to obtain a data.frame with some data selected from
pubmed information. For example, I would like to do an specific search
and obtain a data.frame with the title of each article and the
publication type.

Example of syntax:

> library(reutils)
> library(XML)
>
> pmid <- esearch('"Epidemiology" [Journal]', "pubmed", mindate="2013/01/01", maxdate=paste("2013/12/31", sep=""), retmax="10000000")
Mensajes de aviso perdidos
NCBI requests that you provide an email address with each query to their API.
 Set the global option 'reutils.email' to your address to make this
message go away.
>
> articles <- efetch(pmid, db="pubmed", retmax="10000000")
Mensajes de aviso perdidos
NCBI requests that you provide an email address with each query to their API.
 Set the global option 'reutils.email' to your address to make this
message go away.
>
> journal <- articles$xmlValue("//Title")
>

BUT HERE I HAVE THE PROBLEM

Each article (PMID) can have more than one type of publication.

> ptype <- articles$xmlValue("//PublicationType")

With this syntax I can select the first type of publication
> ptype1 <- articles$xmlValue("//PublicationTypeList//PublicationType[1]")
> length(ptype1)
[1] 181
>

With this syntax I can select the second type of publication.

> ptype2 <- articles$xmlValue("//PublicationTypeList//PublicationType[2]")
> length(ptype2)
[1] 152
>

But I would like to obtain a vector of length 181 (as ptype1) with
NA's when there is no information of publication list

Therefore I cannot obtain a data.frame because I don't obtain a NA
when there is no data in ptype2
> df1 <- data.frame(journal=journal, ptype1=ptype1, ptype2=ptype2 )
Error en data.frame(journal = journal, ptype1 = ptype1, ptype2 = ptype2) :
  arguments imply differing number of rows: 181, 152

How can I do this data.frame???

Best Regards,

Marc



More information about the R-help mailing list