[BioC] R: R: R: how to find the VALIDATED pair (miRNA, gene-3'UTR-sequence)

michael watson (IAH-C) michael.watson at bbsrc.ac.uk
Mon Jun 29 19:02:25 CEST 2009


 
The CORNA library can read this file directly: corna.sf.net

 
________________________________

Från: mauede at alice.it [mailto:mauede at alice.it]
Skickat: må 29/06/2009 4:12
Till: michael watson (IAH-C); Steve Lianoglou
Kopia: Sean Davis; bioconductor List
Ämne: R: [BioC] R: R: R: how to find the VALIDATED pair (miRNA, gene-3'UTR-sequence)



I have preprocessed the Fasta miRNAs files.
I'd like to find an equivalent way to download and read in the file
"http://microrna.sanger.ac.uk/cgi-bin/targets/v5/download.pl/arch.v5.txt.homo_sapiens.zip"
without leaving R.
Maybe I should dowload it firts using a system call and then use R unzip and finally read.table ?
I doubt that read.table will work because it is not a matrix (constant rows and columns length).

Thank you in advance for your help.

Maura


-----Messaggio originale-----
Da: michael watson (IAH-C) [mailto:michael.watson at bbsrc.ac.uk]
Inviato: dom 28/06/2009 16.50
A: mauede at alice.it; Steve Lianoglou
Cc: Sean Davis; bioconductor List
Oggetto: RE: [BioC] R:  R: R: how to find the VALIDATED pair (miRNA, gene-3'UTR-sequence)

Hi Maura

Well, you can get gene:target info from miRBase, read in using CORNA or just read.table.
You can get miRNA sequences also from miRBase using readFASTA.
You can get ensembl gene sequences using biomaRt.
You can read in miRecords data using RODBC.

You can then link this all together using merge(), though I appreciate some work needs to be done on the list provided by readFASTA.

Other than actually doing the work for you, I'm not sure what else we can do.... :)

Mick

-----Original Message-----
From: mauede at alice.it [mailto:mauede at alice.it]
Sent: Sun 28/06/2009 3:35 PM
To: michael watson (IAH-C); Steve Lianoglou
Cc: Sean Davis; bioconductor List
Subject: R: [BioC] R:  R: R: how to find the VALIDATED pair (miRNA, gene-3'UTR-sequence)

Thank you very much.
I just realized the biomart server is up & running again.
Now I have learnt that BioMart can extract a lot of data from Ensembl (from where I have been told to get the genes info)
and can also download the validated miRNAs compressed files.

I stress the main problem I am experienciing, though, is still open.
In fact I have to find a piece of data that allows me to relate all the gene info I can get from BioMart querying Ensembl
to the downloaded miRNAs info. This is because the miRNA identifier is not available through BioMart .... I wish I were mistaken.

However, some other (unique ?) miRNA attribute, that is available through BioMart, is also present in the VALIDATED targets file that is downloadable in XLS format from miRecords. This piece of data would allow me to relate the gene 3UTS string to the targeting miRNA.
The issue is that I do not know how often such miRecords file is updated, and the downloading  is to be performed outside R environment.
Maybe R might handle the download automatically through the R "system" function and then the XLS file can be processed through R package
"RExcelInstaller" ..... just a speculation ...

Regards,
Maura


-----Messaggio originale-----
Da: michael watson (IAH-C) [mailto:michael.watson at bbsrc.ac.uk]
Inviato: dom 28/06/2009 10.15
A: Steve Lianoglou
Cc: mauede at alice.it; Sean Davis; bioconductor List
Oggetto: RE: [BioC] R:  R: R: how to find the VALIDATED pair (miRNA, gene-3'UTR-sequence)

The power of Bioconductor :D

So, some code would look like this:

> mat <- gzcon(url("ftp://ftp.sanger.ac.uk/pub/mirbase/sequences/CURRENT/mature.fa.gz"))
> matfas <- readFASTA(mat, strip.descs=TRUE)
> matstar <- gzcon(url("ftp://ftp.sanger.ac.uk/pub/mirbase/sequences/CURRENT/maturestar.fa.gz"))
> matstarfas <- readFASTA(matstar, strip.descs=TRUE)


-----Original Message-----
From: Steve Lianoglou [mailto:mailinglist.honeypot at gmail.com]
Sent: Sun 28/06/2009 8:51 AM
To: michael watson (IAH-C)
Cc: mauede at alice.it; Sean Davis; bioconductor List
Subject: Re: [BioC] R:  R: R: how to find the VALIDATED pair (miRNA, gene-3'UTR-sequence)

> They'll be in fasta format, and whether or not Bioconductor can read 
> them in I have no idea - I use Bioperl for all my sequence handling.


Yes, bioconductor can: the Biostrings package provides readFASTA and 
writeFASTA that handle this for you.

-steve

--
Steve Lianoglou
Graduate Student: Physiology, Biophysics and Systems Biology
Weill Medical College of Cornell University

Contact Info: http://cbio.mskcc.org/~lianos










Alice Messenger ;-) chatti anche con gli amici di Windows Live Messenger e tutti i telefonini TIM!
Vai su http://maileservizi.alice.it/alice_messenger/index.html?pmk=footer




Alice Messenger ;-) chatti anche con gli amici di Windows Live Messenger e tutti i telefonini TIM!
Vai su http://maileservizi.alice.it/alice_messenger/index.html?pmk=footer 



More information about the Bioconductor mailing list