[BioC] "Special" characters in URI

Francois Pepin fpepin at cs.mcgill.ca
Tue May 3 17:24:05 CEST 2005


There are safe ways of encoding URLs that contain funny characters:
  (space) %20
[ %5B
] %5D

so your url would be:

URL<-'http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?term=gorjanc%20g%5Bau%5D'

That makes your snippet work just fine.

http://www.macromedia.com/cfusion/knowledgebase/index.cfm?id=tn_14143
has the list.

Francois

On Mon, 2005-05-02 at 19:46, Gorjanc Gregor wrote:
> Hello!
> 
> I am crossposting this to R-help and BioC, since it is relevant to both
> groups. 
> 
> I wrote a wrapper for Entrez search utility (link for this is provided bellow), 
> which can add some new search functionality to existing code in Bioconductor's
> package 'annotate'*.
>  
> http://eutils.ncbi.nlm.nih.gov/entrez/query/static/esearch_help.html
> 
> Entrez search utuility returns a XML document but I have a problem to
> use URI to retrieve that file, since URI can also contain characters,
> which should not be there according to 
> 
> http://www.faqs.org/rfcs/rfc2396.html
> 
> I encountered problems with "[" and "]" as well as with space characters.
> However there might also be a problem with others i.e. reserved characters
> in URI syntax.
> 
> My R example is:
> 
> R> library("annotate")
> Loading required package: Biobase 
> Loading required package: tools 
> Welcome to Bioconductor 
>          Vignettes contain introductory material.  To view, 
>          simply type: openVignette() 
>          For details on reading vignettes, see
>          the openVignette help page.
> R> library(XML)
> R> tmp$term <- "gorjanc g[au]"
> R> tmp$URL <- "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?term=gorjanc g[au]"
> R> tmp
> $term
> [1] "gorjanc g[au]"
> 
> $URL
> [1] "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?term=gorjanc g[au]"
> R> xmlTreeParse(tmp$URL, isURL=TRUE, handlers=NULL, asTree=TRUE)
> Error in xmlTreeParse(tmp$URL, isURL = TRUE, handlers = NULL, asTree = TRUE) : 
>         error in creating parser for http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?term=gorjanc g[au]
> 
> # so I have a problem with space and [ and ]
> # let's reduce a problem to just space or [] to be sure
> R> tmp$URL <- "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?term=gorjanc g"
> R> xmlTreeParse(tmp$URL, isURL=TRUE, handlers=NULL, asTree=TRUE)
> Error in xmlTreeParse(tmp$URL, isURL = TRUE, handlers = NULL, asTree = TRUE) : 
>         error in creating parser for http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?term=gorjanc g
> R> tmp$URL <- "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?term=gorjanc[au]"
> R> xmlTreeParse(tmp$URL, isURL=TRUE, handlers=NULL, asTree=TRUE)
> Error in xmlTreeParse(tmp$URL, isURL = TRUE, handlers = NULL, asTree = TRUE) : 
>         error in creating parser for http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?term=gorjanc[au]
> 
> # now show that it works fine without special chars
> R> tmp$URL <- "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?term=gorjanc"
> R> xmlTreeParse(tmp$URL, isURL=TRUE, handlers=NULL, asTree=TRUE)
> $doc
> $file
> [1] "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?term=gorjanc"
> 
> $version
> [1] "1.0"
> 
> $children
> ...
> 
> # now show a workaround for space
> tmp$URL <- "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?term=gorjanc%20g"
> xmlTreeParse(tmp$URL, isURL=TRUE, handlers=NULL, asTree=TRUE)
> R> tmp$URL <- "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?term=gorjanc%20g"
> R> xmlTreeParse(tmp$URL, isURL=TRUE, handlers=NULL, asTree=TRUE)
> $doc
> $file
> [1] "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?term=gorjanc%20g"
> 
> $version
> [1] "1.0"
> 
> $children
> ...
> 
> As can be seen from above there is a possibility to handle this special
> characters and I wonder if this has already been done somewhere? If not
> I thought on a function fixURLchar, which would replace reserved characters
> with ther escaped sequences. Any comments, pointers, ... ?
> 
> from = c(" ", "\"", ",", "#"),
> to = c("%20", "%22", "%2c", "%23"))
> 
> *When I'll solve problem I will send my code to 'annotate' maintainer 
> and he can include it at his will in a package. 
> 
> Lep pozdrav / With regards,
>     Gregor Gorjanc
> 
> ----------------------------------------------------------------------
> University of Ljubljana
> Biotechnical Faculty        URI: http://www.bfro.uni-lj.si/MR/ggorjan
> Zootechnical Department     mail: gregor.gorjanc <at> bfro.uni-lj.si
> Groblje 3                   tel: +386 (0)1 72 17 861
> SI-1230 Domzale             fax: +386 (0)1 72 17 888
> Slovenia, Europe
> ----------------------------------------------------------------------
> "One must learn by doing the thing; for though you think you know it,
>  you have no certainty until you try." Sophocles ~ 450 B.C.
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor



More information about the Bioconductor mailing list