[BioC] CDF for GeneChip miRNA 2 array - Is there a miRNA 3 CDF?

James F. Reid reidjf at gmail.com
Sat Nov 10 13:36:15 CET 2012


Hi Stephen and James,

a comment concerning the use (mature) microRNA names below.

On 09/11/12 17:15, James W. MacDonald wrote:
> Hi Stephen,
>
> On 11/9/2012 10:27 AM, Stephen Turner wrote:
>> Thanks Jim. Yes, that is the initial goal, looking for differentially
>> expressed miRNAs. Perhaps downstream some target prediction is likely
>> in order, or perhaps some pathway analysis based on targets of
>> differentially regulated miRNAs (e.g. something like
>> http://www.ncbi.nlm.nih.gov/pubmed/22649059).
>
> Well that appears to be the state of the art right now for these arrays,
> but I find it wholly unsatisfying. Predicting mRNAs or pathways that
> *might* be targeted by one or more miRNA transcripts is a far cry from
> being able to say that something is in fact happening.
>
> We have actually processed a fair number of these arrays in our core,
> and I'm on the fence. There are usually only a handful that are
> differentially expressed, but when you map them to the hypothetically
> targeted transcripts, you can end up with some huge fraction of the
> transcriptome.
>
> What we have been doing is to pair miRNA and mRNA analysis on the same
> samples, looking for genes that are differentially expressed and
> negatively correlated to the miRNA expression. But this is somewhat
> limited, as the correlation can only be due to mRNA being destabilized
> by miRNA (or targeted for premature degradation). But I wonder if the
> real interesting correlation is between miRNA expression and mRNA
> translation, which you can't get at with an expression array.
>
> So I wonder if we are just doing something because we can't do what we
> really want to do, and doing nothing isn't an option. You can't get
> grants by sitting around waiting for the right technique to arrive, now
> can you?
>
>
>>
>> But correct - right now, it's only differentially regulated miRNAs
>> that the PI is after. I'll have to take a look at Affy's QC tool -
>> I've always used BioC, never Affy's software. This is likely a one-off
>> analysis, as not many folks here are using these chips, so it might
>> not be worth building a reproducible R script if I won't be doing
>> these very often. However, it would be nice to be able to annotate
>> these results with links to the miRbase page, sort of like what I do
>> with Entrez IDs for Gene ST arrays.
>
> That shouldn't be too difficult. Note that the search page can be
> accessed by appending the correct ID to the end of
>
> http://www.mirbase.org/cgi-bin/query.pl?terms=
>
> and you can create HTML tables using the xtable package, but you have to
> pass in the correct data to get a working URI.
>
> Something like
>
> fun <- function(x){
>      paste("<a href=\"http://www.mirbase.org/cgi-bin/query.pl?terms=",
> x, "\">", x, "</a>", sep = "")
> }
>
> then you can use affycoretools:::convertIDs() to change to mirBase IDs
>
> Fake up some stuff:
>
> library(pd.mirna.3.0)
> con <- db(pd.mirna.3.0)
>   ids <- head(grep("^hsa", dbGetQuery(con, "select man_fsetid from
> featureSet;")[,1], value = TRUE))
> links <- fun(affycoretools:::convertIDs(ids))
>
> fc <- rnorm(6)
> print(xtable(data.frame(miRbaseIDs = links, FoldChange = fc)), type =
> "html", include.rownames = FALSE,
> sanitize.text.function = function(x) x, file = "tmp.html")
>
> Or you can use the R2HTML package.

these mature miRNA names are the ones used at the time of the chip 
design/production by Affymetrix and inevitably change over time. I 
notice many contain a * in their name which removed from the the miRBase 
nomenclature in 2011 (http://www.mirbase.org/blog/2011/04/whats-in-a-name/).
A quick check with the mirbase.db package indicates that nearly a third 
are "lost" (ie have been renamed, or even removed, or changed species 
etc. as more data about them get accumulated).

library(mirbase.db)
library(pd.mirna.3.0)
con <- db(pd.mirna.3.0)
ids <- grep("^hsa", dbGetQuery(con, "select man_fsetid from 
featureSet;")[,1], value = TRUE)
length(ids)
#[1] 1733

matureIDs <- toTable(mirbaseMATURE)[['mature_name']]

100 * (1 - (sum(affycoretools:::convertIDs(ids) %in% matureIDs) / 
length(ids)))
#[1] 28.04385

Ideally one would need the sequences used to design the probes which I 
think are available from Affymetrix to map these to the latest release 
of miRBase, I couldn't find it a GPL for this array on GEO.

Best,
J.

>
>>
>> So if not RMA, what alternative is better for processing the affybatch
>> into an expressionset?
>
> I forget what the miRNA QC tool does as the default, and I can't get it
> to run on my 64-bit Windows box to see. The manual doesn't appear to say
> what the default is, although it may well be RMA. I don't recall there
> being much difference between the two, and having no way to say what the
> truth is, any claim of 'better' would be pure conjecture. My point was
> simply that RMA is sort of silly in this case, as all of the probes are
> identical, and measure the same thing.
>
> Best,
>
> Jim
>
>
>>
>> Thanks,
>>
>> Stephen
>>
>> On Thu, Nov 8, 2012 at 6:01 PM, James W. MacDonald<jmacdon at uw.edu>
>> wrote:
>>> Hi Stephen,
>>>
>>>
>>> On 11/8/2012 5:25 PM, Stephen Turner wrote:
>>>> Thanks much. I used read.celfiles() and rma() worked perfectly at this
>>>> point. I will definitely take you up on help getting this to gel with
>>>> the rest of my workflow.
>>>>
>>>> My next step with gene ST arrays is to annotate the expressionset
>>>> object with fData, such that when I use topTable() later on, all my
>>>> results are annotated. E.g.:
>>>>
>>>> ## Which annotation package are you using?
>>>> eset at annotation
>>>> annodb<- "hugene10sttranscriptcluster.db"
>>>>
>>>> ## Annotate the features
>>>> ls(paste("package:", annodb, sep=""))
>>>> ID<- featureNames(eset)
>>>> Symbol<- as.character(lookUp(ID, annodb, "SYMBOL"))
>>>> Name<- as.character(lookUp(ID, annodb, "GENENAME"))
>>>> Entrez<- as.character(lookUp(ID, annodb, "ENTREZID"))
>>>> tmp<- data.frame(ID=ID, Entrez=Entrez, Symbol=Symbol, Name=Name,
>>>> stringsAsFactors=F)
>>>> tmp[tmp=="NA"]<- NA
>>>> fData(eset)<- tmp
>>>>
>>>> But I'm not sure what to do here because ls("package:pd.mirna.3.0")
>>>> doesn't return what the typical hu/mogene10sttranscriptcluster.db DBs
>>>> return.
>>>
>>> Right. Note that something like the MoGene ST chip measures mRNA,
>>> whereas
>>> the mirna 3.0 measures miRNA, which is a completely different class
>>> of RNA.
>>> While some miRNAs have Entrez Gene IDs, they don't have symbols or names
>>> that I know of.
>>>
>>> miRNAs target various mRNA species for either silencing (by binding
>>> to the
>>> mRNA transcript, making it double stranded in a particular region,
>>> thereby
>>> eliminating translation to protein) or for premature degradation.
>>>
>>> To make things more complicated, the mRNA that are thought to be
>>> targeted by
>>> a given miRNA are based on one or more of sequence homology,
>>> conservation,
>>> thermodynamic properties and something else that escapes me right
>>> now. In
>>> other words, the targeting of mRNA by miRNA is almost always
>>> computationally
>>> derived. So depending on which algorithm (and what cutoffs you use),
>>> you can
>>> get from zero to thousands of mRNAs targeted by a given miRNA.
>>>
>>> As an example, go here:
>>>
>>> http://www.mirbase.org/cgi-bin/mirna_entry.pl?acc=MI0003205
>>>
>>> this is just some random miRNA I searched for. Now scroll down to the
>>> 'Mature sequence' section, and click on some of the links for Predicted
>>> targets. Fun, huh?
>>>
>>> Also note that the miR 3.0 chip has miRNA for lots of different
>>> species, as
>>> well as the hairpin configuration (which AFAICT is all garbage, but
>>> YMMV).
>>> So you may or may not want to be filtering out miRNA for uninteresting
>>> species, depending on whether or not you (or your PI) think a particular
>>> miRNA from say M. nemestrina is also expressed in the species you are
>>> working with.
>>>
>>> Also note that RMA is sort of silly for these arrays anyway. A mature
>>> miRNA
>>> is 21-23 bases long, and the affy chip uses 25 mers. So the replicate
>>> probes
>>> in a probeset are usually just the same thing in a different place on
>>> the
>>> chip. You could make the argument that the algorithm used in the
>>> miRNA QC
>>> tool that Affy will give you for free does a better job.
>>>
>>> So is the goal here to just find differentially expressed miRNAs?
>>>
>>> Best,
>>>
>>> Jim
>>>
>>>
>>>
>>>> Many thanks,
>>>>
>>>> Stephen
>>>>
>>>> On Thu, Nov 8, 2012 at 10:32 AM, Benilton Carvalho
>>>> <beniltoncarvalho at gmail.com>   wrote:
>>>>> The problem is that you have both affy and oligo loaded simultaneously
>>>>> (I'll
>>>>> add this to my todo list, so in the future users do not need to worry
>>>>> about
>>>>> it).
>>>>>
>>>>> Option 1)  (don't load oligo)
>>>>>
>>>>> By using ReadAffy(), you're importing the data via affy package, which
>>>>> does
>>>>> not know how to handle miRNA-3.0 arrays.
>>>>>
>>>>> If you rather stick to your original workflow, you'd need to follow
>>>>> the
>>>>> "unrecommended" path of converting a PGF to a CDF (I rather not say
>>>>> much
>>>>> about this), and then build the required annotation packages yourself.
>>>>>
>>>>>
>>>>> Option 2) (don't load affy)  (disclaimer: I'm the author of oligo)
>>>>>
>>>>> If you don't load affy and use read.celfiles (from oligo), you'll
>>>>> get the
>>>>> rma() part done easily. At this point, I'd be happy to work with
>>>>> you to
>>>>> incorporate tools to simplify the use of the other packages that
>>>>> you have
>>>>> in
>>>>> your workflow.
>>>>>
>>>>>
>>>>> best,
>>>>> benilton
>>>>>
>>>>>
>>>>> On 8 November 2012 15:12, Stephen Turner<vustephen at gmail.com>   wrote:
>>>>>> Just wanted to resurrect this issue. I routinely analyze gene 1.0 ST
>>>>>> chips in my core, but this is the first time I'm looking at the miRNA
>>>>>> 3.0 chip (or any Affy miRNA chip for that matter).
>>>>>>
>>>>>> I understand that there's no 3.0 CDF environment available. How might
>>>>>> I go about building one and incorporating that into my workflow?
>>>>>>
>>>>>> My typical [Hu/Mo]Gene 1.0 ST workflow goes something like this:
>>>>>>
>>>>>> ############################################
>>>>>> ## Load data
>>>>>> affybatch<- ReadAffy(filenames)
>>>>>> eset<- rma(affybatch)
>>>>>>
>>>>>> ## Annotate
>>>>>> ID<- featureNames(eset)
>>>>>> Symbol<- as.character(lookUp(ID, "hugene10sttranscriptcluster.db",
>>>>>> "SYMBOL"))
>>>>>> Name<- as.character(lookUp(ID, "hugene10sttranscriptcluster.db",
>>>>>> "GENENAME"))
>>>>>> fData(eset)<- data.frame(ID=ID, Symbol=Symbol, Name=Name)
>>>>>>
>>>>>> ## Typical QC with arrayQualityMetrics and analysis with limma
>>>>>> ############################################
>>>>>>
>>>>>> I'm getting this error when using rma() on the affybatch object:
>>>>>>
>>>>>>> rma(affybatch)
>>>>>> Error in function (classes, fdef, mtable)  :
>>>>>>     unable to find an inherited method for function "rma", for
>>>>>> signature
>>>>>> "AffyBatch"
>>>>>>
>>>>>> And additionally when I try to view the affybatch:
>>>>>>
>>>>>> AffyBatch object
>>>>>> size of arrays=541x541 features (19 kb)
>>>>>> cdf=miRNA-3_0 (??? affyids)
>>>>>> number of samples=6
>>>>>> Error in getCdfInfo(object) :
>>>>>>     Could not obtain CDF environment, problems encountered:
>>>>>> Specified environment does not contain miRNA-3_0
>>>>>> Library - package mirna30cdf not installed
>>>>>> Bioconductor - mirna30cdf not available
>>>>>>
>>>>>> Thanks.
>>>>>>
>>>>>>
>>>>>>> sessionInfo()
>>>>>> R version 2.15.0 (2012-03-30)
>>>>>> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
>>>>>>
>>>>>> locale:
>>>>>> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>>>>>>
>>>>>> attached base packages:
>>>>>> [1] grid      stats     graphics  grDevices utils     datasets
>>>>>> methods   base
>>>>>>
>>>>>> other attached packages:
>>>>>>    [1] pd.mirna.3.0_3.6.0         oligo_1.22.0
>>>>>> oligoClasses_1.20.0
>>>>>>    [4] RSQLite_0.11.2             DBI_0.2-5
>>>>>> biomaRt_2.14.0
>>>>>>    [7] VennDiagram_1.5.1          SPIA_2.8.0
>>>>>> pvclust_1.2-2
>>>>>> [10] genefilter_1.40.0          gplots_2.11.0
>>>>>> MASS_7.3-22
>>>>>> [13] KernSmooth_2.23-8          caTools_1.13
>>>>>> bitops_1.0-4.1
>>>>>> [16] gdata_2.12.0               gtools_2.7.0
>>>>>> limma_3.14.1
>>>>>> [19] arrayQualityMetrics_3.14.0 annotate_1.36.0
>>>>>> AnnotationDbi_1.20.2
>>>>>> [22] affy_1.36.0                Biobase_2.18.0
>>>>>> BiocGenerics_0.4.0
>>>>>> [25] BiocInstaller_1.8.3
>>>>>>
>>>>>> loaded via a namespace (and not attached):
>>>>>>    [1] affxparser_1.30.0     affyio_1.26.0         affyPLM_1.34.0
>>>>>> beadarray_2.8.1
>>>>>>    [5] BeadDataPackR_1.10.0  Biostrings_2.26.2     bit_1.1-9
>>>>>> Cairo_1.5-1
>>>>>>    [9] cluster_1.14.3        codetools_0.2-8       colorspace_1.2-0
>>>>>> ff_2.2-9
>>>>>> [13] foreach_1.4.0         gcrma_2.30.0          GenomicRanges_1.10.3
>>>>>> Hmisc_3.10-1
>>>>>> [17] hwriter_1.3           IRanges_1.16.4        iterators_1.0.6
>>>>>> lattice_0.20-10
>>>>>> [21] latticeExtra_0.6-24   parallel_2.15.0       plyr_1.7.1
>>>>>> preprocessCore_1.20.0
>>>>>> [25] RColorBrewer_1.0-5    RCurl_1.95-1.1        reshape2_1.2.1
>>>>>> setRNG_2011.11-2
>>>>>> [29] splines_2.15.0        stats4_2.15.0         stringr_0.6.1
>>>>>> survival_2.36-14
>>>>>> [33] SVGAnnotation_0.93-1  tools_2.15.0          vsn_3.26.0
>>>>>> XML_3.95-0.1
>>>>>> [37] xtable_1.7-0          zlibbioc_1.4.0
>>>>>>
>>>>>>
>>>>>> On Sat, Oct 13, 2012 at 12:56 AM, Dana Most<danamost at gmail.com>
>>>>>> wrote:
>>>>>>> Hi All,
>>>>>>>
>>>>>>> Have you managed to find a cdf for the miRNA 3.0?
>>>>>>> I keep getting the error : "...cdf=miRNA-3_0 (??? affyids)..."
>>>>>>>
>>>>>>> When I spoke to Affymetrix they said that the 3.0 version doesn't
>>>>>>> have
>>>>>>> a
>>>>>>> .cdf and that a .cdf format wouldn't be compatible...
>>>>>>> They said I should use the 'xps' package on the bioconductor website
>>>>>>> together with a .pgf from their website.
>>>>>>> 'xps' doesn't work with Windows 7, which unfortunately is what I
>>>>>>> have.
>>>>>>>
>>>>>>> Can anyone help me?
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Dana
>>>>>>>
>>>>>>>           [[alternative HTML version deleted]]
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Bioconductor mailing list
>>>>>>> Bioconductor at r-project.org
>>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>>>> Search the archives:
>>>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>>> _______________________________________________
>>>>>> Bioconductor mailing list
>>>>>> Bioconductor at r-project.org
>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>>> Search the archives:
>>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>>
>>> --
>>> James W. MacDonald, M.S.
>>> Biostatistician
>>> University of Washington
>>> Environmental and Occupational Health Sciences
>>> 4225 Roosevelt Way NE, # 100
>>> Seattle WA 98105-6099
>>>
>



More information about the Bioconductor mailing list