[BioC] "Special" characters in URI

Gorjanc Gregor Gregor.Gorjanc at bfro.uni-lj.si
Tue May 3 01:46:22 CEST 2005


Hello!

I am crossposting this to R-help and BioC, since it is relevant to both
groups. 

I wrote a wrapper for Entrez search utility (link for this is provided bellow), 
which can add some new search functionality to existing code in Bioconductor's
package 'annotate'*.
 
http://eutils.ncbi.nlm.nih.gov/entrez/query/static/esearch_help.html

Entrez search utuility returns a XML document but I have a problem to
use URI to retrieve that file, since URI can also contain characters,
which should not be there according to 

http://www.faqs.org/rfcs/rfc2396.html

I encountered problems with "[" and "]" as well as with space characters.
However there might also be a problem with others i.e. reserved characters
in URI syntax.

My R example is:

R> library("annotate")
Loading required package: Biobase 
Loading required package: tools 
Welcome to Bioconductor 
         Vignettes contain introductory material.  To view, 
         simply type: openVignette() 
         For details on reading vignettes, see
         the openVignette help page.
R> library(XML)
R> tmp$term <- "gorjanc g[au]"
R> tmp$URL <- "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?term=gorjanc g[au]"
R> tmp
$term
[1] "gorjanc g[au]"

$URL
[1] "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?term=gorjanc g[au]"
R> xmlTreeParse(tmp$URL, isURL=TRUE, handlers=NULL, asTree=TRUE)
Error in xmlTreeParse(tmp$URL, isURL = TRUE, handlers = NULL, asTree = TRUE) : 
        error in creating parser for http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?term=gorjanc g[au]

# so I have a problem with space and [ and ]
# let's reduce a problem to just space or [] to be sure
R> tmp$URL <- "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?term=gorjanc g"
R> xmlTreeParse(tmp$URL, isURL=TRUE, handlers=NULL, asTree=TRUE)
Error in xmlTreeParse(tmp$URL, isURL = TRUE, handlers = NULL, asTree = TRUE) : 
        error in creating parser for http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?term=gorjanc g
R> tmp$URL <- "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?term=gorjanc[au]"
R> xmlTreeParse(tmp$URL, isURL=TRUE, handlers=NULL, asTree=TRUE)
Error in xmlTreeParse(tmp$URL, isURL = TRUE, handlers = NULL, asTree = TRUE) : 
        error in creating parser for http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?term=gorjanc[au]

# now show that it works fine without special chars
R> tmp$URL <- "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?term=gorjanc"
R> xmlTreeParse(tmp$URL, isURL=TRUE, handlers=NULL, asTree=TRUE)
$doc
$file
[1] "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?term=gorjanc"

$version
[1] "1.0"

$children
...

# now show a workaround for space
tmp$URL <- "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?term=gorjanc%20g"
xmlTreeParse(tmp$URL, isURL=TRUE, handlers=NULL, asTree=TRUE)
R> tmp$URL <- "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?term=gorjanc%20g"
R> xmlTreeParse(tmp$URL, isURL=TRUE, handlers=NULL, asTree=TRUE)
$doc
$file
[1] "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?term=gorjanc%20g"

$version
[1] "1.0"

$children
...

As can be seen from above there is a possibility to handle this special
characters and I wonder if this has already been done somewhere? If not
I thought on a function fixURLchar, which would replace reserved characters
with ther escaped sequences. Any comments, pointers, ... ?

from = c(" ", "\"", ",", "#"),
to = c("%20", "%22", "%2c", "%23"))

*When I'll solve problem I will send my code to 'annotate' maintainer 
and he can include it at his will in a package. 

Lep pozdrav / With regards,
    Gregor Gorjanc

----------------------------------------------------------------------
University of Ljubljana
Biotechnical Faculty        URI: http://www.bfro.uni-lj.si/MR/ggorjan
Zootechnical Department     mail: gregor.gorjanc <at> bfro.uni-lj.si
Groblje 3                   tel: +386 (0)1 72 17 861
SI-1230 Domzale             fax: +386 (0)1 72 17 888
Slovenia, Europe
----------------------------------------------------------------------
"One must learn by doing the thing; for though you think you know it,
 you have no certainty until you try." Sophocles ~ 450 B.C.



More information about the Bioconductor mailing list