[BioC] identifying drosophila miRNA targets

James W. MacDonald jmacdon at uw.edu
Fri Mar 29 23:11:14 CET 2013


Hi Fiona,

Probably the easiest way to do this is to convert the flybase_cg ids to 
ensembl IDs.

## read sanger data in
## there is some weird cruft in line 4685, best to just remove the 
thirteenth column
dat <- read.table("v5.txt.drosophila_melanogaster", sep = "\t", 
stringsAsFactors = FALSE)[,-13]
library(drosophila.db)
## map flybase_cg IDs to ensembl
x <- select(org.Dm.eg.db, gsub("-[A-Z]+", "",dat[,12]), c("ENSEMBL"), 
"FLYBASECG")
## there are some duplicates here, but I don't think it will matter
## merge back together and write back out
dat$merge <- gsub("-[A-Z]+", "",dat[,12])
dat2 <- merge(dat, x, by.x="merge", by.y=1, all.x = TRUE)
write.table(dat2, "v5.txt.drosophila_melanogaster2", sep = "\t", 
col.names = FALSE, row.names = FALSE, quote = FALSE)

## note that I say the file is not sanger, and then tell mirna2mrna() 
which columns to use.
test <- mirna2mrna(miRNA, "v5.txt.drosophila_melanogaster2", mRNA, 
"org.Dm.eg.db","drosophila2.db", FALSE, 2,14)

With the truncated mRNA and miRNA probe IDs you give below, I get no 
mappings, but I assume you have way more mRNA transcripts.

Let me know if this works for you.

Best,

Jim



On 3/29/2013 8:09 AM, Fiona Ingleby wrote:
> Hi Jim,
>
> Thanks very much for pointing that out - it seems mirna2mrna is 
> exactly what I was after, I don't know how I managed to overlook it….
>
> I'm a bit puzzled about the results I'm getting, however, and so if 
> you have a minute to think this through then I'd be really 
> grateful. The help pages are pretty clear, and so I've managed to get 
> the function to run with my data without any problems….but I'm getting 
> 'named list()' as output. Which might simply suggest that there are no 
> correlations between the miRNAs and mRNAs in my data (?). But I'm not 
> convinced and I'm wondering if I've done something wrong somewhere 
> along the way (I'm looking at 39 differentially expressed miRNAs along 
> with 2638 differentially expressed mRNAs, so I'd be surprised if there 
> were none that correlate with each other).
>
> I'm wondering if I'm doing something daft like using RNA IDs in the 
> wrong format (which might be one explanation for getting 0 matches 
> returned from the database?). At the moment I'm just taking character 
> vectors directly from the ExpressionSet. So I have 2 ExpressionSets, 
> each representing only the probes which are significantly 
> differentially expressed in each dataset - I've called these sigmRNA 
> (2638 x 12 samples) and sigmiRNA (39 x 12 samples) for mRNA and miRNA 
> respectively.
>
> >featureNames(sigmRNA)
>    [1] "1622906_at"   "1622915_at"   "1622917_a_at" "1622920_at"   
> "1622926_at"   "1622932_s_at" "1622935_at"   "1622940_at"   "1622946_at"
>   [10] "1622952_at"   "1622956_at"   "1622959_at"   "1622960_at"   
> "1622965_s_at" "1622974_at"   "1622975_at"   "1622978_at"   "1622992_at"
>   [19] "1623002_at"   "1623004_a_at" "1623008_at"   "1623019_a_at" 
> "1623022_at"   "1623025_at"   "1623026_a_at" "1623030_at"   "1623031_a_at"
>
> …and so on for 2638 entries.
>
> >featureNames(sigmiRNA)
>  [1] "dme-miR-1002_st" "dme-miR-1004_st" "dme-miR-1017_st" 
> "dme-miR-124_st"  "dme-miR-2500_st" "dme-miR-286_st"
>  [7] "dme-miR-2a_st"   "dme-miR-306_st"  "dme-miR-310_st" 
>  "dme-miR-311_st"  "dme-miR-312_st"  "dme-miR-313_st"
>
> …etc. So I'm using mirna2mrna like this:
>
> test<-mirna2mrna(miRNAids=featureNames(sigmiRNA),
>   miRNAannot="v5.txt.drosophila_melanogaster",   #downloaded from the 
> rbi website and saved in the working directory
>   mRNAids=featureNames(sigmRNA),
>           orgPkg="org.Dm.eg.db",chipPkg="drosophila2.db",
>           sanger=T,miRNAcol=NULL,mRNAcol=NULL,transType="ensembl")
>
> and then I get:
>
> > test
> named list()
>
> I've put the sessionInfo() output at the bottom of the email. I also 
> looked through the source code on the Bioconductor code search 
> website, pulled out the 'convertIDs' function, and ran this as an 
> independent function on the lists of RNAs to check to see what it was 
> doing, but I can't see anything that looks odd to me - it removes the 
> '_st'/'_at' as I expected.
>
> So I'm a bit stuck. I'm sure I've misunderstood something, but can't 
> pick out what it is myself. I suppose it's totally possible that the 
> analysis is fine and there are just no correlations between the miRNAs 
> and mRNAs of interest in my data - but I thought I would check. If you 
> (or anyone) has any ideas, I'd really appreciate the help.
>
> Thanks again,
>
> Fiona
>
> Dr Fiona C Ingleby
>
> Postdoctoral Research Fellow
> University of Sussex
>
> Email: F.Ingleby at sussex.ac.uk <mailto:F.Ingleby at sussex.ac.uk>
> Website: fionaingleby.weebly.com <http://fionaingleby.weebly.com>
> Tel: +44(0)1273678559
>
> > sessionInfo()
> R version 2.15.2 (2012-10-26)
> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
>
> locale:
> [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] drosophila2.db_2.8.1 org.Dm.eg.db_2.8.0   RSQLite_0.11.2       
> DBI_0.2-5            AnnotationDbi_1.20.7 Biobase_2.18.0
> [7] BiocGenerics_0.4.0
>
> loaded via a namespace (and not attached):
> [1] IRanges_1.16.6  parallel_2.15.2 stats4_2.15.2   tools_2.15.2
>
>
>
> On 28 Mar 2013, at 16:43, James W. MacDonald <jmacdon at uw.edu 
> <mailto:jmacdon at uw.edu>> wrote:
>
>> Hi Fiona,
>>
>> I have a function called mirna2mrna (yeah, I know, lame function 
>> name...) in my affycoretools package that does this, based on the 
>> sanger microcosm targets data that you can download here:
>>
>> http://www.ebi.ac.uk/enright-srv/microcosm/cgi-bin/targets/v5/download.pl
>>
>> there is also a function makeHmap() that will create a heatmap with 
>> the miRNA/mRNA pairs, where the color of the cells is based on the 
>> correlation between the two RNA species (with the intent to show 
>> negative correlations, indicating that the miRNA is hypothetically 
>> causing premature degradation of the mRNA).
>>
>> I think the help pages for these two functions are reasonable, but 
>> please let me know if you have any questions.
>>
>> Best,
>>
>> Jim
>>
>>
>>
>> On 3/28/2013 12:30 PM, Fiona Ingleby wrote:
>>> Hi everyone,
>>>
>>> I am working with mRNA data from Affy 'drosophila2' arrays and miRNA 
>>> data from Affy 'mirna3' arrays. I have identified a list of 
>>> differentially expressed mRNAs and miRNAs. I'm having a bit of 
>>> trouble with some downstream analyses and I'm hoping someone might 
>>> be able to offer some help.
>>>
>>> I would like to use my list of differentially expressed miRNAs to 
>>> access online databases (e.g. miRBase, microRNA.org…) and extract 
>>> the names of all the potential target mRNAs. Then I'd like to use 
>>> this list of mRNAs to look through my mRNA expression data. I'm 
>>> aware of packages like 'RmiR' and 'microRNA' which have built-in 
>>> functions for finding miRNA targets, but as far as I can tell, 
>>> 'RmiR' uses miRNA databases for humans only and 'microRNA' works 
>>> with human and mouse data only. So is there a package I am unaware 
>>> of (or another application of 'RmiR'/'microRNA' that I am unaware 
>>> of) for looking at drosophila data?
>>>
>>> So far I have also considered the 'biomaRt' package to see if the 
>>> database query function on there can help me, but I haven't had much 
>>> luck. For instance, if I try an example list of miRNAs:
>>>
>>> mirna<-c("dme-miR-1002","dme-miR-312","dme-miR-973")
>>> library(biomaRt)
>>> ensembl<-useMart("ensembl",dataset="dmelanogaster_gene_ensembl")
>>> getBM(attributes="mirbase_accession",filters="mirbase_id",values=mirna,mart=ensembl)
>>>
>>> then 'logical(0)' is returned, as if there are no records for those 
>>> miRNAs - but by searching the database manually I know the records 
>>> are there.
>>>
>>> Alternatively I can try:
>>>
>>> miRNA<- getBM(c("mirbase_accession","mirbase_id", "ensembl_gene_id", 
>>> "start_position", "chromosome_name"), filters = c("with_mirbase"), 
>>> values = list(T), mart = ensembl)
>>>
>>> which returns a table of various bits of information on miRNAs, but 
>>> I cannot adapt this command to just look at my list of miRNAs of 
>>> interest (ie. the 'mirna' vector above). I've included the 
>>> sessionInfo() output for these at the bottom of the email, but I 
>>> suspect my problem is more to do with the fact I'm not going about 
>>> this the right way (as opposed to a problem with package versions 
>>> and coding etc.). I'm not even sure that using 'biomaRt' will give 
>>> me the information I eventually want (the target mRNAs of these 
>>> miRNAs), I was just trying it out, to see what it was capable of in 
>>> terms of querying these databases.  So I apologise for the 
>>> vagueness. Since I haven't managed to get very far by myself then 
>>> it's difficult to be more specific, but I'd really appreciate it if 
>>> anyone could offer some advice, even just to point me in the 
>>> direction of a useful package which might have gone unnoticed by me.
>>>
>>> Many thanks,
>>>
>>> Fiona
>>>
>>> Dr Fiona C Ingleby
>>> Postdoctoral Research Fellow
>>> University of Sussex
>>> Email: F.Ingleby at sussex.ac.uk
>>> Website: fionaingleby.weebly.com
>>>
>>>
>>>> sessionInfo()
>>> R version 2.15.2 (2012-10-26)
>>> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
>>>
>>> locale:
>>> [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
>>>
>>> attached base packages:
>>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>>
>>> other attached packages:
>>> [1] biomaRt_2.14.0     affy_1.36.1        Biobase_2.18.0 
>>>     BiocGenerics_0.4.0
>>>
>>> loaded via a namespace (and not attached):
>>>  [1] affyio_1.26.0         BiocInstaller_1.8.3   grid_2.15.2 
>>>           lattice_0.20-14       Matrix_1.0-11         MCMCglmm_2.17
>>>  [7] preprocessCore_1.20.0 RCurl_1.95-4.1        tools_2.15.2 
>>>          XML_3.95-0.2          zlibbioc_1.4.0
>>> [[alternative HTML version deleted]]
>>>
>>>
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives: 
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>> -- 
>> James W. MacDonald, M.S.
>> Biostatistician
>> University of Washington
>> Environmental and Occupational Health Sciences
>> 4225 Roosevelt Way NE, # 100
>> Seattle WA 98105-6099
>>
>>
>

-- 
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099



More information about the Bioconductor mailing list