[BioC] How do I parse HTML table using RCurl?

James F. Reid james.reid at ifom-ieo-campus.it
Tue Mar 15 09:48:54 CET 2011


Hi Ruppert,


On 03/14/2011 11:35 PM, Ruppert Valentino wrote:
>
> Hi James,
>
> Many thanks for telling me that target scan is accessible via AnnotationDbi as this will help me to solve the problem in a different way as the others suggested.
>
> Can you tell me if bioconductor has resource to access miRanda http://www.microrna.org/microrna/ and pictar http://pictar.mdc-berlin.de/cgi-bin/PicTar_vertebrate.cgi
>
> If so, which library can I use?

No, I'm afraid these two resources are not available within 
bioconductor. The miranda-based predictions at microrna.org are 
available for download as tab delim txt. This is not the case for the 
pictar resource AFAIK, notice that this resource has not been updated 
since March 2007.

Best,
J.

>
>
> Many thanks
>
> Ruppert
>
>
>
> ----------------------------------------
>> Date: Mon, 14 Mar 2011 23:15:45 +0100
>> From: james.reid at ifom-ieo-campus.it
>> To: ruppert7 at hotmail.com
>> CC: bioconductor at stat.math.ethz.ch
>> Subject: Re: [BioC] How do I parse HTML table using RCurl?
>>
>> Hi Ruppert,
>>
>> the targetscan database for Human and Mouse is already available in
>> bioconductor as an AnnotationDbi annotation resource
>> (targetscan.Hs.eg.db and targetscan.Mm.eg.db), so is mirbase but without
>> any target predictions. As others have pointed out on the mailing list I
>> would not recommend parsing the html of a query as the format is likely
>> to change in time, but rather download the database and re-format.
>> If you are interested in providing other miRNA target prediction
>> resources to the community, I would be willing to help.
>>
>> Best,
>> J.
>>
>>
>> On 03/14/2011 09:18 PM, Ruppert Valentino wrote:
>>>
>>>
>>> Hello,
>>>
>>> I am trying to write a script that will enter miRNA and get the predicted target genes for that miRNA. I am trying to use various software to do this, one of them is TargetScan. The problem is that I don't know how to parse the HTML output table so that I can get the target genes only.
>>>
>>> For example I am search for target genes for the miRNA mmu-miR-1 as follows:
>>>
>>> http://www.targetscan.org/cgi-bin/targetscan/vert_50/targetscan.cgi?species=Human&gid=&mir_sc=&mir_c=&mir_nc=&mirg=mmu-miR-1
>>>
>>> This generates a table
>>>
>>>
>>>
>>> The script is:
>>>
>>> URL<- "http://www.targetscan.org/cgi-bin/targetscan/vert_50/targetscan.cgi?species=Human&gid=&mir_sc=&mir_c=&mir_nc=&mirg=mmu-miR-1"
>>> dat<- readLines(URL)
>>>
>>>
>>> But I don't know how to parse the table to separate it into columns then I can take the column entitled "Human ortholog of target gene" which would have the target genes.
>>>
>>>
>>> In the example above the first gene COL4A3 starts at HTML code:
>>>
>>> COL4A3
>>>
>>>
>>>
>>> Is there any way to format such a table into columns then transpose the column entitled "Human ortholog of target gene" and pass that to a variable?
>>>
>>>
>>> Many thanks,
>>>
>>>
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>>> 		 	   		
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>



More information about the Bioconductor mailing list