[BioC] PostForm() with KEGG

Ovokeraye Achinike-Oduaran ovokeraye at gmail.com
Thu Mar 1 09:38:09 CET 2012


Hi Duncan and Martin,

My bad, no bug whatsoever...it was me. Got my code sorted for the most
part. Thanks again for all the help. It's much appreciated.

-Avoks

On Wed, Feb 29, 2012 at 12:19 PM, Ovokeraye Achinike-Oduaran
<ovokeraye at gmail.com> wrote:
> Hi Morgan,
>
> Thanks. I think there's possibly a bug with the
> getHTMLFormDescription() but I do understand what you've explained.
>
> Thanks again.
>
>
> -Avoks
>
> On Tue, Feb 28, 2012 at 6:19 PM, Martin Morgan <mtmorgan at fhcrc.org> wrote:
>> On 02/28/2012 06:14 AM, Ovokeraye Achinike-Oduaran wrote:
>>>
>>> Hi Duncan,
>>>
>>> My understanding is that xpathSApply() combines both the geneSetNode()
>>> and the sapply(). I hope that this is a correct assumption. In
>>> attempting to retrieve nodes in general from the pathway, I used  both
>>>
>>> xpathSApply(doc, "//li/node()",  xmlGetAttr, "href")
>>> and
>>> xpathSApply(doc, "//li/a/node()",  xmlGetAttr, "href")
>>>
>>> and the I get nothing (null) back even though no visible error pops
>>> up. I something wrong with the way I'm using the path or do I just not
>>> yet grasp the whole XPath concept (I did read the online tutorial)?
>>
>>
>> the NULL means that no nodes match your xpath query.
>>
>>
>>>
>>> Sorry to drag this on, but please help.
>>
>>
>> I used Duncan's RHTMLForms suggestion
>>
>>  library(RHTMLForms)
>>  url = "http://www.genome.jp/kegg/tool/map_pathway1.html"
>>  u = "http://www.genome.jp/kegg-bin/search_pathway_object"
>>  ff = getHTMLFormDescription(url)
>>
>>  fun = createFunction(ff[[1]])
>>  txt = fun(unclassified = "ko:K01803 cpd:C00111 cpd:C00118 K00134 C00236",
>> target = "alias", .url = u)
>>
>> to retrieve the text and then
>>
>>  library(XML)
>>  xml = htmlTreeParse(txt, asText=TRUE, useInternalNodes=TRUE)
>>
>> to parse to xml (maybe there is a more direct way, using the reader argument
>> to createFunction?). If I experiment a little, I see for instance that
>>
>>  getNodeSet(xml, "//li/a")
>>
>> returns the 'li' elements with nested 'a' elements, and
>>
>>  getNodeSet(xml, "//li/a[@target]")
>>
>> returns the subset of those elements that have a 'target' attribute. Finally
>>
>>> head(xpathSApply(xml, "//li/a[@target]", xmlValue))
>> [1] "ko00010 Glycolysis / Gluconeogenesis"
>> [2] "ko01100 Metabolic pathways"
>> [3] "ko01110 Biosynthesis of secondary metabolites"
>> [4] "ko01120 Microbial metabolism in diverse environments"
>> [5] "ko00710 Carbon fixation in photosynthetic organisms"
>> [6] "ko00562 Inositol phosphate metabolism"
>>
>> seems to be about what you want, or
>>
>>
>> head(xpathSApply(xml, "//li/a/@href"))
>>                                                href
>> "/kegg-bin/show_pathway?13304448561022/ko00010.args"
>>                                                href
>>                     "javascript:display('ko00010')"
>>                                                href
>> "/kegg-bin/show_pathway?13304448561022/ko01100.args"
>>                                                href
>>                     "javascript:display('ko01100')"
>>                                                href
>> "/kegg-bin/show_pathway?13304448561022/ko01110.args"
>>                                                href
>>                     "javascript:display('ko01110')"
>>
>> Maybe the KEGGSOAP package already does what you're interested in? The web
>> scraping you're doing is going to break as soon as the web site tweaks its
>> presentation.
>>
>> Or maybe
>>
>>> library(org.Hs.eg.db)
>>> head(toTable(revmap(org.Hs.egPATH)[c("00232", "04142")]))
>>  gene_id path_id
>> 1       9   00232
>> 2      10   00232
>> 3      20   04142
>> 4      53   04142
>> 5      54   04142
>> 6     162   04142
>>
>> The KEGG information in the org.* and KEGG packages dates to the last free
>> public release, and so are starting to be dated).
>>
>> Martin
>>
>>
>>>
>>> Thanks.
>>>
>>> Avoks
>>>
>>> On Mon, Feb 27, 2012 at 4:09 PM, Ovokeraye Achinike-Oduaran
>>> <ovokeraye at gmail.com>  wrote:
>>>>
>>>> Thank you so very much, Duncan. I will go get myself enlightened:).
>>>> Thanks again.
>>>>
>>>> Avoks
>>>>
>>>> On Mon, Feb 27, 2012 at 3:50 PM, Duncan Temple Lang
>>>> <duncan at wald.ucdavis.edu>  wrote:
>>>>>
>>>>>
>>>>> Use
>>>>>
>>>>>   target = "alias"
>>>>>
>>>>> in the call.
>>>>>
>>>>> If you don't know how to map form elements to parameters in the request,
>>>>> you
>>>>> can either read  a tutorial on HTML forms, or alternatively, use
>>>>> the RHTMLForms package which you have loaded according to your search
>>>>> path, e.g.
>>>>>
>>>>>  # read the form  and then turn the information into an R function.
>>>>> ff =
>>>>> getHTMLFormDescription("http://www.genome.jp/kegg/tool/map_pathway1.html")
>>>>> fun = createFunction(ff[[1]])
>>>>>
>>>>>  # Since the action in the form is javascript, we'll provide the
>>>>>  # URL manually.
>>>>> u = "http://www.genome.jp/kegg-bin/search_pathway_object"
>>>>> out = fun(unclassified = "ko:K01803 cpd:C00111 cpd:C00118 K00134
>>>>> C00236",
>>>>>          target = "alias", .url = u)
>>>>>
>>>>> The benefits of the RHTMLForms include using the same defaults
>>>>> as the form on the Web page, adding hidden parameters, identifying
>>>>> the names of the parameters.
>>>>>
>>>>>   D
>>>>>
>>>>>
>>>>> On 2/27/12 3:08 AM, Ovokeraye Achinike-Oduaran wrote:
>>>>>>
>>>>>> Hi Duncan,
>>>>>>
>>>>>> I noticed that with the script as is, it doesn't take into
>>>>>> consideration the "include alias" checkbox. I tried modifying the
>>>>>> script to force include that option but it still did not work. Any
>>>>>> ideas?
>>>>>>
>>>>>> u = "http://www.genome.jp/kegg-bin/search_pathway_object"
>>>>>> data = postForm(u,
>>>>>>                .params = list(org_name = "hsadd",
>>>>>>                unclassified = paste(readLines(file.choose()), collapse
>>>>>> = "\n"),
>>>>>>                file = "", checkbox = "alias", submit = "Exec"))
>>>>>>
>>>>>>
>>>>>> Thanks again.
>>>>>>
>>>>>> Avoks
>>>>>>
>>>>>>
>>>>>> On Mon, Feb 27, 2012 at 10:24 AM, Ovokeraye Achinike-Oduaran
>>>>>> <ovokeraye at gmail.com>  wrote:
>>>>>>>
>>>>>>> Hi Duncan,
>>>>>>>
>>>>>>> Thanks a bunch.
>>>>>>>
>>>>>>> -Avoks
>>>>>>>
>>>>>>> On Fri, Feb 24, 2012 at 11:09 PM, Duncan Temple Lang
>>>>>>> <duncan at wald.ucdavis.edu>  wrote:
>>>>>>>>
>>>>>>>> Hi Avoks
>>>>>>>>
>>>>>>>> While the form is provided by KEGG and so bio-relatd,
>>>>>>>> you might have been better posting this to the more general r-help
>>>>>>>> mailing list.
>>>>>>>>
>>>>>>>>
>>>>>>>> You are posting the HTTP request to the wrong URL. That is the URL
>>>>>>>> of the Web page that displays the form, not the URL that processes
>>>>>>>> the input from the form.
>>>>>>>> You have to look at the JavaScript that is referenced in the action
>>>>>>>> attribute of the HTML form element.
>>>>>>>>
>>>>>>>> The second issue is that you are submitting the name of a local file.
>>>>>>>> This won't work as is.  You either need to identify this is the name
>>>>>>>> of a file and not the contents
>>>>>>>> of the file to send, or else send the contents.  In this form, you
>>>>>>>> can send the
>>>>>>>> contents via the the unclassified parameter.
>>>>>>>>
>>>>>>>>
>>>>>>>> u = "http://www.genome.jp/kegg-bin/search_pathway_object"
>>>>>>>> data = postForm(u,
>>>>>>>>                .params = list(org_name = "hsadd",
>>>>>>>>                               unclassified = "hsa:7167 hsa:GPI
>>>>>>>> cpd:C00118\nALDOA 1.2.1.12 C00236",
>>>>>>>>                               file = "", submit = "Exec"))
>>>>>>>>
>>>>>>>>
>>>>>>>> If your input is in a file, you can use
>>>>>>>>
>>>>>>>>  unclassified = paste(readLines(file.choose()), collapse = "\n")
>>>>>>>>
>>>>>>>> as the value for the unclassified parameter.
>>>>>>>>
>>>>>>>>
>>>>>>>> There are additional parameters that the form accepts that may be
>>>>>>>> relevant for your search.
>>>>>>>>
>>>>>>>>
>>>>>>>> As for processing the results, you will want to use
>>>>>>>>
>>>>>>>>  doc = htmlParse(data, asText = TRUE)
>>>>>>>>
>>>>>>>> and then use getNodeSet()/xpathSApply() or direct tree extraction to
>>>>>>>> access the nodes you want, e.g.
>>>>>>>>
>>>>>>>>  xpathSApply(doc, "//li/a",  xmlGetAttr, "href")
>>>>>>>>
>>>>>>>>
>>>>>>>>  D.
>>>>>>>>
>>>>>>>>
>>>>>>>> On 2/24/12 6:09 AM, Ovokeraye Achinike-Oduaran wrote:
>>>>>>>>>
>>>>>>>>> Hi all,
>>>>>>>>>
>>>>>>>>> I am trying to use postForm() with the KEGG website but I am stuck
>>>>>>>>> on
>>>>>>>>> how to get my results. Is it possible (code below) or am I using
>>>>>>>>> postForm() wrongly? The code appears to run but I'm not quite sure
>>>>>>>>> how
>>>>>>>>> to read the results assuming there are any. Please help.
>>>>>>>>>
>>>>>>>>> Thanks.
>>>>>>>>>
>>>>>>>>> Avoks
>>>>>>>>> ____
>>>>>>>>>
>>>>>>>>> data = postForm("http://www.genome.jp/kegg/tool/map_pathway1.html",
>>>>>>>>> org_name = "hsadd",
>>>>>>>>> file = file.choose(),
>>>>>>>>> submit = "Exec")
>>>>>>>>>
>>>>>>>>>> sessionInfo()
>>>>>>>>>
>>>>>>>>> R version 2.14.1 (2011-12-22)
>>>>>>>>> Platform: i386-pc-mingw32/i386 (32-bit)
>>>>>>>>>
>>>>>>>>> locale:
>>>>>>>>> [1] LC_COLLATE=English_xxx.1252  LC_CTYPE=English_xxx.1252
>>>>>>>>> [3] LC_MONETARY=English_xxx.1252 LC_NUMERIC=C
>>>>>>>>> [5] LC_TIME=English_xxx.1252
>>>>>>>>>
>>>>>>>>> attached base packages:
>>>>>>>>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>>>>>>>>
>>>>>>>>> other attached packages:
>>>>>>>>> [1] RHTMLForms_0.5-1 XML_3.9-4.1      RCurl_1.91-1.1
>>>>>>>>> bitops_1.0-4.1
>>>>>>>>>
>>>>>>>>> loaded via a namespace (and not attached):
>>>>>>>>> [1] tools_2.14.1
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> Bioconductor mailing list
>>>>>>>>> Bioconductor at r-project.org
>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>>>>>> Search the archives:
>>>>>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Bioconductor mailing list
>>>>>>>> Bioconductor at r-project.org
>>>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>>>>> Search the archives:
>>>>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>>
>>
>> --
>> Computational Biology
>> Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
>>
>> Location: M1-B861
>> Telephone: 206 667-2793



More information about the Bioconductor mailing list