[BioC] CDF for GeneChip miRNA 2 array - Is there a miRNA 3 CDF?

Fri Nov 9 16:27:07 CET 2012

Thanks Jim. Yes, that is the initial goal, looking for differentially
expressed miRNAs. Perhaps downstream some target prediction is likely
in order, or perhaps some pathway analysis based on targets of
differentially regulated miRNAs (e.g. something like
http://www.ncbi.nlm.nih.gov/pubmed/22649059).

But correct - right now, it's only differentially regulated miRNAs
that the PI is after. I'll have to take a look at Affy's QC tool -
I've always used BioC, never Affy's software. This is likely a one-off
analysis, as not many folks here are using these chips, so it might
not be worth building a reproducible R script if I won't be doing
these very often. However, it would be nice to be able to annotate
these results with links to the miRbase page, sort of like what I do
with Entrez IDs for Gene ST arrays.

So if not RMA, what alternative is better for processing the affybatch
into an expressionset?

Thanks,

Stephen

On Thu, Nov 8, 2012 at 6:01 PM, James W. MacDonald <jmacdon at uw.edu> wrote:
> Hi Stephen,
>
>
> On 11/8/2012 5:25 PM, Stephen Turner wrote:
>>
>> Thanks much. I used read.celfiles() and rma() worked perfectly at this
>> point. I will definitely take you up on help getting this to gel with
>> the rest of my workflow.
>>
>> My next step with gene ST arrays is to annotate the expressionset
>> object with fData, such that when I use topTable() later on, all my
>> results are annotated. E.g.:
>>
>> ## Which annotation package are you using?
>> eset at annotation
>> annodb<- "hugene10sttranscriptcluster.db"
>>
>> ## Annotate the features
>> ls(paste("package:", annodb, sep=""))
>> ID<- featureNames(eset)
>> Symbol<- as.character(lookUp(ID, annodb, "SYMBOL"))
>> Name<- as.character(lookUp(ID, annodb, "GENENAME"))
>> Entrez<- as.character(lookUp(ID, annodb, "ENTREZID"))
>> tmp<- data.frame(ID=ID, Entrez=Entrez, Symbol=Symbol, Name=Name,
>> stringsAsFactors=F)
>> tmp[tmp=="NA"]<- NA
>> fData(eset)<- tmp
>>
>> But I'm not sure what to do here because ls("package:pd.mirna.3.0")
>> doesn't return what the typical hu/mogene10sttranscriptcluster.db DBs
>> return.
>
>
> Right. Note that something like the MoGene ST chip measures mRNA, whereas
> the mirna 3.0 measures miRNA, which is a completely different class of RNA.
> While some miRNAs have Entrez Gene IDs, they don't have symbols or names
> that I know of.
>
> miRNAs target various mRNA species for either silencing (by binding to the
> mRNA transcript, making it double stranded in a particular region, thereby
> eliminating translation to protein) or for premature degradation.
>
> To make things more complicated, the mRNA that are thought to be targeted by
> a given miRNA are based on one or more of sequence homology, conservation,
> thermodynamic properties and something else that escapes me right now. In
> other words, the targeting of mRNA by miRNA is almost always computationally
> derived. So depending on which algorithm (and what cutoffs you use), you can
> get from zero to thousands of mRNAs targeted by a given miRNA.
>
> As an example, go here:
>
> http://www.mirbase.org/cgi-bin/mirna_entry.pl?acc=MI0003205
>
> this is just some random miRNA I searched for. Now scroll down to the
> 'Mature sequence' section, and click on some of the links for Predicted
> targets. Fun, huh?
>
> Also note that the miR 3.0 chip has miRNA for lots of different species, as
> well as the hairpin configuration (which AFAICT is all garbage, but YMMV).
> So you may or may not want to be filtering out miRNA for uninteresting
> species, depending on whether or not you (or your PI) think a particular
> miRNA from say M. nemestrina is also expressed in the species you are
> working with.
>
> Also note that RMA is sort of silly for these arrays anyway. A mature miRNA
> is 21-23 bases long, and the affy chip uses 25 mers. So the replicate probes
> in a probeset are usually just the same thing in a different place on the
> chip. You could make the argument that the algorithm used in the miRNA QC
> tool that Affy will give you for free does a better job.
>
> So is the goal here to just find differentially expressed miRNAs?
>
> Best,
>
> Jim
>
>
>
>>
>> Many thanks,
>>
>> Stephen
>>
>> On Thu, Nov 8, 2012 at 10:32 AM, Benilton Carvalho
>> <beniltoncarvalho at gmail.com>  wrote:
>>>
>>> The problem is that you have both affy and oligo loaded simultaneously
>>> (I'll
>>> add this to my todo list, so in the future users do not need to worry
>>> about
>>> it).
>>>
>>> Option 1)  (don't load oligo)
>>>
>>> By using ReadAffy(), you're importing the data via affy package, which
>>> does
>>> not know how to handle miRNA-3.0 arrays.
>>>
>>> If you rather stick to your original workflow, you'd need to follow the
>>> "unrecommended" path of converting a PGF to a CDF (I rather not say much
>>> about this), and then build the required annotation packages yourself.
>>>
>>>
>>> Option 2) (don't load affy)  (disclaimer: I'm the author of oligo)
>>>
>>> If you don't load affy and use read.celfiles (from oligo), you'll get the
>>> rma() part done easily. At this point, I'd be happy to work with you to
>>> incorporate tools to simplify the use of the other packages that you have
>>> in
>>> your workflow.
>>>
>>>
>>> best,
>>> benilton
>>>
>>>
>>> On 8 November 2012 15:12, Stephen Turner<vustephen at gmail.com>  wrote:
>>>>
>>>> Just wanted to resurrect this issue. I routinely analyze gene 1.0 ST
>>>> chips in my core, but this is the first time I'm looking at the miRNA
>>>> 3.0 chip (or any Affy miRNA chip for that matter).
>>>>
>>>> I understand that there's no 3.0 CDF environment available. How might
>>>> I go about building one and incorporating that into my workflow?
>>>>
>>>> My typical [Hu/Mo]Gene 1.0 ST workflow goes something like this:
>>>>
>>>> ############################################
>>>> ## Load data
>>>> affybatch<- ReadAffy(filenames)
>>>> eset<- rma(affybatch)
>>>>
>>>> ## Annotate
>>>> ID<- featureNames(eset)
>>>> Symbol<- as.character(lookUp(ID, "hugene10sttranscriptcluster.db",
>>>> "SYMBOL"))
>>>> Name<- as.character(lookUp(ID, "hugene10sttranscriptcluster.db",
>>>> "GENENAME"))
>>>> fData(eset)<- data.frame(ID=ID, Symbol=Symbol, Name=Name)
>>>>
>>>> ## Typical QC with arrayQualityMetrics and analysis with limma
>>>> ############################################
>>>>
>>>> I'm getting this error when using rma() on the affybatch object:
>>>>
>>>>> rma(affybatch)
>>>>
>>>> Error in function (classes, fdef, mtable)  :
>>>>    unable to find an inherited method for function "rma", for signature
>>>> "AffyBatch"
>>>>
>>>> And additionally when I try to view the affybatch:
>>>>
>>>> AffyBatch object
>>>> size of arrays=541x541 features (19 kb)
>>>> cdf=miRNA-3_0 (??? affyids)
>>>> number of samples=6
>>>> Error in getCdfInfo(object) :
>>>>    Could not obtain CDF environment, problems encountered:
>>>> Specified environment does not contain miRNA-3_0
>>>> Library - package mirna30cdf not installed
>>>> Bioconductor - mirna30cdf not available
>>>>
>>>> Thanks.
>>>>
>>>>
>>>>> sessionInfo()
>>>>
>>>> R version 2.15.0 (2012-03-30)
>>>> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
>>>>
>>>> locale:
>>>> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>>>>
>>>> attached base packages:
>>>> [1] grid      stats     graphics  grDevices utils     datasets
>>>> methods   base
>>>>
>>>> other attached packages:
>>>>   [1] pd.mirna.3.0_3.6.0         oligo_1.22.0
>>>> oligoClasses_1.20.0
>>>>   [4] RSQLite_0.11.2             DBI_0.2-5
>>>> biomaRt_2.14.0
>>>>   [7] VennDiagram_1.5.1          SPIA_2.8.0
>>>> pvclust_1.2-2
>>>> [10] genefilter_1.40.0          gplots_2.11.0              MASS_7.3-22
>>>> [13] KernSmooth_2.23-8          caTools_1.13
>>>> bitops_1.0-4.1
>>>> [16] gdata_2.12.0               gtools_2.7.0
>>>> limma_3.14.1
>>>> [19] arrayQualityMetrics_3.14.0 annotate_1.36.0
>>>> AnnotationDbi_1.20.2
>>>> [22] affy_1.36.0                Biobase_2.18.0
>>>> BiocGenerics_0.4.0
>>>> [25] BiocInstaller_1.8.3
>>>>
>>>> loaded via a namespace (and not attached):
>>>>   [1] affxparser_1.30.0     affyio_1.26.0         affyPLM_1.34.0
>>>> beadarray_2.8.1
>>>>   [5] BeadDataPackR_1.10.0  Biostrings_2.26.2     bit_1.1-9
>>>> Cairo_1.5-1
>>>>   [9] cluster_1.14.3        codetools_0.2-8       colorspace_1.2-0
>>>> ff_2.2-9
>>>> [13] foreach_1.4.0         gcrma_2.30.0          GenomicRanges_1.10.3
>>>> Hmisc_3.10-1
>>>> [17] hwriter_1.3           IRanges_1.16.4        iterators_1.0.6
>>>> lattice_0.20-10
>>>> [21] latticeExtra_0.6-24   parallel_2.15.0       plyr_1.7.1
>>>> preprocessCore_1.20.0
>>>> [25] RColorBrewer_1.0-5    RCurl_1.95-1.1        reshape2_1.2.1
>>>> setRNG_2011.11-2
>>>> [29] splines_2.15.0        stats4_2.15.0         stringr_0.6.1
>>>> survival_2.36-14
>>>> [33] SVGAnnotation_0.93-1  tools_2.15.0          vsn_3.26.0
>>>> XML_3.95-0.1
>>>> [37] xtable_1.7-0          zlibbioc_1.4.0
>>>>
>>>>
>>>> On Sat, Oct 13, 2012 at 12:56 AM, Dana Most<danamost at gmail.com>  wrote:
>>>>>
>>>>> Hi All,
>>>>>
>>>>> Have you managed to find a cdf for the miRNA 3.0?
>>>>> I keep getting the error : "...cdf=miRNA-3_0 (??? affyids)..."
>>>>>
>>>>> When I spoke to Affymetrix they said that the 3.0 version doesn't have
>>>>> a
>>>>> .cdf and that a .cdf format wouldn't be compatible...
>>>>> They said I should use the 'xps' package on the bioconductor website
>>>>> together with a .pgf from their website.
>>>>> 'xps' doesn't work with Windows 7, which unfortunately is what I have.
>>>>>
>>>>> Can anyone help me?
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Dana
>>>>>
>>>>>          [[alternative HTML version deleted]]
>>>>>
>>>>> _______________________________________________
>>>>> Bioconductor mailing list
>>>>> Bioconductor at r-project.org
>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>> Search the archives:
>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>
>>>> _______________________________________________
>>>> Bioconductor mailing list
>>>> Bioconductor at r-project.org
>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>> Search the archives:
>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>>>
>
> --
> James W. MacDonald, M.S.
> Biostatistician
> University of Washington
> Environmental and Occupational Health Sciences
> 4225 Roosevelt Way NE, # 100
> Seattle WA 98105-6099
>