[BioC] CDF for GeneChip miRNA 2 array - Is there a miRNA 3 CDF?

Fri Nov 9 18:15:35 CET 2012

Hi Stephen,

On 11/9/2012 10:27 AM, Stephen Turner wrote:
> Thanks Jim. Yes, that is the initial goal, looking for differentially
> expressed miRNAs. Perhaps downstream some target prediction is likely
> in order, or perhaps some pathway analysis based on targets of
> differentially regulated miRNAs (e.g. something like
> http://www.ncbi.nlm.nih.gov/pubmed/22649059).

Well that appears to be the state of the art right now for these arrays, 
but I find it wholly unsatisfying. Predicting mRNAs or pathways that 
*might* be targeted by one or more miRNA transcripts is a far cry from 
being able to say that something is in fact happening.

We have actually processed a fair number of these arrays in our core, 
and I'm on the fence. There are usually only a handful that are 
differentially expressed, but when you map them to the hypothetically 
targeted transcripts, you can end up with some huge fraction of the 
transcriptome.

What we have been doing is to pair miRNA and mRNA analysis on the same 
samples, looking for genes that are differentially expressed and 
negatively correlated to the miRNA expression. But this is somewhat 
limited, as the correlation can only be due to mRNA being destabilized 
by miRNA (or targeted for premature degradation). But I wonder if the 
real interesting correlation is between miRNA expression and mRNA 
translation, which you can't get at with an expression array.

So I wonder if we are just doing something because we can't do what we 
really want to do, and doing nothing isn't an option. You can't get 
grants by sitting around waiting for the right technique to arrive, now 
can you?

>
> But correct - right now, it's only differentially regulated miRNAs
> that the PI is after. I'll have to take a look at Affy's QC tool -
> I've always used BioC, never Affy's software. This is likely a one-off
> analysis, as not many folks here are using these chips, so it might
> not be worth building a reproducible R script if I won't be doing
> these very often. However, it would be nice to be able to annotate
> these results with links to the miRbase page, sort of like what I do
> with Entrez IDs for Gene ST arrays.

That shouldn't be too difficult. Note that the search page can be 
accessed by appending the correct ID to the end of

http://www.mirbase.org/cgi-bin/query.pl?terms=

and you can create HTML tables using the xtable package, but you have to 
pass in the correct data to get a working URI.

Something like

fun <- function(x){
     paste("<a href=\"http://www.mirbase.org/cgi-bin/query.pl?terms=", 
x, "\">", x, "</a>", sep = "")
}

then you can use affycoretools:::convertIDs() to change to mirBase IDs

Fake up some stuff:

library(pd.mirna.3.0)
con <- db(pd.mirna.3.0)
  ids <- head(grep("^hsa", dbGetQuery(con, "select man_fsetid from 
featureSet;")[,1], value = TRUE))
links <- fun(affycoretools:::convertIDs(ids))

fc <- rnorm(6)
print(xtable(data.frame(miRbaseIDs = links, FoldChange = fc)), type = 
"html", include.rownames = FALSE,
sanitize.text.function = function(x) x, file = "tmp.html")

Or you can use the R2HTML package.

>
> So if not RMA, what alternative is better for processing the affybatch
> into an expressionset?

I forget what the miRNA QC tool does as the default, and I can't get it 
to run on my 64-bit Windows box to see. The manual doesn't appear to say 
what the default is, although it may well be RMA. I don't recall there 
being much difference between the two, and having no way to say what the 
truth is, any claim of 'better' would be pure conjecture. My point was 
simply that RMA is sort of silly in this case, as all of the probes are 
identical, and measure the same thing.

Best,

Jim

>
> Thanks,
>
> Stephen
>
> On Thu, Nov 8, 2012 at 6:01 PM, James W. MacDonald<jmacdon at uw.edu>  wrote:
>> Hi Stephen,
>>
>>
>> On 11/8/2012 5:25 PM, Stephen Turner wrote:
>>> Thanks much. I used read.celfiles() and rma() worked perfectly at this
>>> point. I will definitely take you up on help getting this to gel with
>>> the rest of my workflow.
>>>
>>> My next step with gene ST arrays is to annotate the expressionset
>>> object with fData, such that when I use topTable() later on, all my
>>> results are annotated. E.g.:
>>>
>>> ## Which annotation package are you using?
>>> eset at annotation
>>> annodb<- "hugene10sttranscriptcluster.db"
>>>
>>> ## Annotate the features
>>> ls(paste("package:", annodb, sep=""))
>>> ID<- featureNames(eset)
>>> Symbol<- as.character(lookUp(ID, annodb, "SYMBOL"))
>>> Name<- as.character(lookUp(ID, annodb, "GENENAME"))
>>> Entrez<- as.character(lookUp(ID, annodb, "ENTREZID"))
>>> tmp<- data.frame(ID=ID, Entrez=Entrez, Symbol=Symbol, Name=Name,
>>> stringsAsFactors=F)
>>> tmp[tmp=="NA"]<- NA
>>> fData(eset)<- tmp
>>>
>>> But I'm not sure what to do here because ls("package:pd.mirna.3.0")
>>> doesn't return what the typical hu/mogene10sttranscriptcluster.db DBs
>>> return.
>>
>> Right. Note that something like the MoGene ST chip measures mRNA, whereas
>> the mirna 3.0 measures miRNA, which is a completely different class of RNA.
>> While some miRNAs have Entrez Gene IDs, they don't have symbols or names
>> that I know of.
>>
>> miRNAs target various mRNA species for either silencing (by binding to the
>> mRNA transcript, making it double stranded in a particular region, thereby
>> eliminating translation to protein) or for premature degradation.
>>
>> To make things more complicated, the mRNA that are thought to be targeted by
>> a given miRNA are based on one or more of sequence homology, conservation,
>> thermodynamic properties and something else that escapes me right now. In
>> other words, the targeting of mRNA by miRNA is almost always computationally
>> derived. So depending on which algorithm (and what cutoffs you use), you can
>> get from zero to thousands of mRNAs targeted by a given miRNA.
>>
>> As an example, go here:
>>
>> http://www.mirbase.org/cgi-bin/mirna_entry.pl?acc=MI0003205
>>
>> this is just some random miRNA I searched for. Now scroll down to the
>> 'Mature sequence' section, and click on some of the links for Predicted
>> targets. Fun, huh?
>>
>> Also note that the miR 3.0 chip has miRNA for lots of different species, as
>> well as the hairpin configuration (which AFAICT is all garbage, but YMMV).
>> So you may or may not want to be filtering out miRNA for uninteresting
>> species, depending on whether or not you (or your PI) think a particular
>> miRNA from say M. nemestrina is also expressed in the species you are
>> working with.
>>
>> Also note that RMA is sort of silly for these arrays anyway. A mature miRNA
>> is 21-23 bases long, and the affy chip uses 25 mers. So the replicate probes
>> in a probeset are usually just the same thing in a different place on the
>> chip. You could make the argument that the algorithm used in the miRNA QC
>> tool that Affy will give you for free does a better job.
>>
>> So is the goal here to just find differentially expressed miRNAs?
>>
>> Best,
>>
>> Jim
>>
>>
>>
>>> Many thanks,
>>>
>>> Stephen
>>>
>>> On Thu, Nov 8, 2012 at 10:32 AM, Benilton Carvalho
>>> <beniltoncarvalho at gmail.com>   wrote:
>>>> The problem is that you have both affy and oligo loaded simultaneously
>>>> (I'll
>>>> add this to my todo list, so in the future users do not need to worry
>>>> about
>>>> it).
>>>>
>>>> Option 1)  (don't load oligo)
>>>>
>>>> By using ReadAffy(), you're importing the data via affy package, which
>>>> does
>>>> not know how to handle miRNA-3.0 arrays.
>>>>
>>>> If you rather stick to your original workflow, you'd need to follow the
>>>> "unrecommended" path of converting a PGF to a CDF (I rather not say much
>>>> about this), and then build the required annotation packages yourself.
>>>>
>>>>
>>>> Option 2) (don't load affy)  (disclaimer: I'm the author of oligo)
>>>>
>>>> If you don't load affy and use read.celfiles (from oligo), you'll get the
>>>> rma() part done easily. At this point, I'd be happy to work with you to
>>>> incorporate tools to simplify the use of the other packages that you have
>>>> in
>>>> your workflow.
>>>>
>>>>
>>>> best,
>>>> benilton
>>>>
>>>>
>>>> On 8 November 2012 15:12, Stephen Turner<vustephen at gmail.com>   wrote:
>>>>> Just wanted to resurrect this issue. I routinely analyze gene 1.0 ST
>>>>> chips in my core, but this is the first time I'm looking at the miRNA
>>>>> 3.0 chip (or any Affy miRNA chip for that matter).
>>>>>
>>>>> I understand that there's no 3.0 CDF environment available. How might
>>>>> I go about building one and incorporating that into my workflow?
>>>>>
>>>>> My typical [Hu/Mo]Gene 1.0 ST workflow goes something like this:
>>>>>
>>>>> ############################################
>>>>> ## Load data
>>>>> affybatch<- ReadAffy(filenames)
>>>>> eset<- rma(affybatch)
>>>>>
>>>>> ## Annotate
>>>>> ID<- featureNames(eset)
>>>>> Symbol<- as.character(lookUp(ID, "hugene10sttranscriptcluster.db",
>>>>> "SYMBOL"))
>>>>> Name<- as.character(lookUp(ID, "hugene10sttranscriptcluster.db",
>>>>> "GENENAME"))
>>>>> fData(eset)<- data.frame(ID=ID, Symbol=Symbol, Name=Name)
>>>>>
>>>>> ## Typical QC with arrayQualityMetrics and analysis with limma
>>>>> ############################################
>>>>>
>>>>> I'm getting this error when using rma() on the affybatch object:
>>>>>
>>>>>> rma(affybatch)
>>>>> Error in function (classes, fdef, mtable)  :
>>>>>     unable to find an inherited method for function "rma", for signature
>>>>> "AffyBatch"
>>>>>
>>>>> And additionally when I try to view the affybatch:
>>>>>
>>>>> AffyBatch object
>>>>> size of arrays=541x541 features (19 kb)
>>>>> cdf=miRNA-3_0 (??? affyids)
>>>>> number of samples=6
>>>>> Error in getCdfInfo(object) :
>>>>>     Could not obtain CDF environment, problems encountered:
>>>>> Specified environment does not contain miRNA-3_0
>>>>> Library - package mirna30cdf not installed
>>>>> Bioconductor - mirna30cdf not available
>>>>>
>>>>> Thanks.
>>>>>
>>>>>
>>>>>> sessionInfo()
>>>>> R version 2.15.0 (2012-03-30)
>>>>> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
>>>>>
>>>>> locale:
>>>>> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>>>>>
>>>>> attached base packages:
>>>>> [1] grid      stats     graphics  grDevices utils     datasets
>>>>> methods   base
>>>>>
>>>>> other attached packages:
>>>>>    [1] pd.mirna.3.0_3.6.0         oligo_1.22.0
>>>>> oligoClasses_1.20.0
>>>>>    [4] RSQLite_0.11.2             DBI_0.2-5
>>>>> biomaRt_2.14.0
>>>>>    [7] VennDiagram_1.5.1          SPIA_2.8.0
>>>>> pvclust_1.2-2
>>>>> [10] genefilter_1.40.0          gplots_2.11.0              MASS_7.3-22
>>>>> [13] KernSmooth_2.23-8          caTools_1.13
>>>>> bitops_1.0-4.1
>>>>> [16] gdata_2.12.0               gtools_2.7.0
>>>>> limma_3.14.1
>>>>> [19] arrayQualityMetrics_3.14.0 annotate_1.36.0
>>>>> AnnotationDbi_1.20.2
>>>>> [22] affy_1.36.0                Biobase_2.18.0
>>>>> BiocGenerics_0.4.0
>>>>> [25] BiocInstaller_1.8.3
>>>>>
>>>>> loaded via a namespace (and not attached):
>>>>>    [1] affxparser_1.30.0     affyio_1.26.0         affyPLM_1.34.0
>>>>> beadarray_2.8.1
>>>>>    [5] BeadDataPackR_1.10.0  Biostrings_2.26.2     bit_1.1-9
>>>>> Cairo_1.5-1
>>>>>    [9] cluster_1.14.3        codetools_0.2-8       colorspace_1.2-0
>>>>> ff_2.2-9
>>>>> [13] foreach_1.4.0         gcrma_2.30.0          GenomicRanges_1.10.3
>>>>> Hmisc_3.10-1
>>>>> [17] hwriter_1.3           IRanges_1.16.4        iterators_1.0.6
>>>>> lattice_0.20-10
>>>>> [21] latticeExtra_0.6-24   parallel_2.15.0       plyr_1.7.1
>>>>> preprocessCore_1.20.0
>>>>> [25] RColorBrewer_1.0-5    RCurl_1.95-1.1        reshape2_1.2.1
>>>>> setRNG_2011.11-2
>>>>> [29] splines_2.15.0        stats4_2.15.0         stringr_0.6.1
>>>>> survival_2.36-14
>>>>> [33] SVGAnnotation_0.93-1  tools_2.15.0          vsn_3.26.0
>>>>> XML_3.95-0.1
>>>>> [37] xtable_1.7-0          zlibbioc_1.4.0
>>>>>
>>>>>
>>>>> On Sat, Oct 13, 2012 at 12:56 AM, Dana Most<danamost at gmail.com>   wrote:
>>>>>> Hi All,
>>>>>>
>>>>>> Have you managed to find a cdf for the miRNA 3.0?
>>>>>> I keep getting the error : "...cdf=miRNA-3_0 (??? affyids)..."
>>>>>>
>>>>>> When I spoke to Affymetrix they said that the 3.0 version doesn't have
>>>>>> a
>>>>>> .cdf and that a .cdf format wouldn't be compatible...
>>>>>> They said I should use the 'xps' package on the bioconductor website
>>>>>> together with a .pgf from their website.
>>>>>> 'xps' doesn't work with Windows 7, which unfortunately is what I have.
>>>>>>
>>>>>> Can anyone help me?
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Dana
>>>>>>
>>>>>>           [[alternative HTML version deleted]]
>>>>>>
>>>>>> _______________________________________________
>>>>>> Bioconductor mailing list
>>>>>> Bioconductor at r-project.org
>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>>> Search the archives:
>>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>> _______________________________________________
>>>>> Bioconductor mailing list
>>>>> Bioconductor at r-project.org
>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>> Search the archives:
>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>
>> --
>> James W. MacDonald, M.S.
>> Biostatistician
>> University of Washington
>> Environmental and Occupational Health Sciences
>> 4225 Roosevelt Way NE, # 100
>> Seattle WA 98105-6099
>>

-- 
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099