[BioC] R: how to find the VALIDATED pair (miRNA, gene-3'UTR-sequence)

Thu Jun 25 16:02:52 CEST 2009

One more thing to add:

>> Similarity	hsa-miR-130a	miRanda	miRNA_target	2	120825363	120825385	 
>> +	.	16.5359	1.687830e-02	ENST00000295228	INHBB

> R> library(biomaRt)
> R> hmart <- useMart('ensembl', dataset='hsapiens_gene_ensembl')
> R> refseqs <-  
> c 
> ("NM_000757 
> ","NM_000757 
> ","NM_005461","NM_005924","NM_005924","NM_005924","NM_019102")
> R> gene.map <- getBM(attributes=c('hgnc_symbol', 'ensembl_gene_id',  
> 'ensembl_transcript_id','refseq_dna'), filters='refseq_dna',  
> value=refseqs, mart=hmart)
>
> R> gene.map
>  hgnc_symbol ensembl_gene_id ensembl_transcript_id refseq_dna
> 1        CSF1 ENSG00000184371       ENST00000369802  NM_000757
> 2        MAFB ENSG00000204103       ENST00000396967  NM_005461
> 3       MEOX2 ENSG00000106511       ENST00000262041  NM_005924
> 4       HOXA5 ENSG00000106004       ENST00000222726  NM_019102

Your original ensembl transcript wasn't included in our result, so  
instead of telling the `getBM` function to use a list of refseq IDs to  
get info for, we can flip this around and find out what refseq ID your  
"ENST00000295228" transcript points to. Using the same `hmart` object,  
you can do it like so:

R> getBM(attributes=c('hgnc_symbol', 'ensembl_gene_id',  
'ensembl_transcript_id','refseq_dna'),  
filters='ensembl_transcript_id', value='ENST00000295228', mart=hmart)

   hgnc_symbol ensembl_gene_id ensembl_transcript_id refseq_dna
1       INHBB ENSG00000163083       ENST00000295228  NM_002193

Note we just had to change the type of ID we are passing to the  
`filters` parameter.

-steve

--
Steve Lianoglou
Graduate Student: Physiology, Biophysics and Systems Biology
Weill Medical College of Cornell University

Contact Info: http://cbio.mskcc.org/~lianos/contact