[BioC] Coercing matrix into expression set, for normalization of only subsets of miRNAs (affy miRNA3.0)

James W. MacDonald jmacdon at uw.edu
Thu Oct 24 16:07:26 CEST 2013

Hi Dana,

On Wednesday, October 23, 2013 12:04:31 AM, Dana Most wrote:
> Dear All,
> How can I transform/coerce a gene expression matrix into an expression set?
> I'm using affy miRNA 3.0 data and I would like to normalize only a subset of
> the samples (i have 4 groups of samples and would like to choose 2) and only a
> subset of microRNAs (I have mature and premature microRNAs and they should not
> be normalized together).
> It should look something like this:
> affyExpressionFS <- read.celfiles(celFiles, pkgname="pd.mirna.3.0")
> data = exprs(affyExpressionFS)
> data = data[1:1000,1:20]

Why do you think the first 1000 rows are useful here? Is this just 
supposed to be an example?

> exprsData = coerce data into expression set
> rma(exprsData)

You can't run rma on an ExpressionSet, as an ExpressionSet is intended 
to contain summarized data. Instead you need to use an 
ExpressionFeatureSet object (which is what you are getting your matrix 
of data out of).

That said, you will have to do some serious coding if you want to 
accomplish this. Right now there is no easy way (that I know of - 
Benilton might correct me here) to subset to a particular set of 
probes. You can check out the oligo source from subversion and make 
whatever changes you want. There is even a 'subset' argument that is 
for future use that you could implement if you want.

But this leads me to your original rationale for wanting to do this, 
where you state that mature and precursor miRNAs should not be 
normalized together. I am not sure why you would think this, and I am 
pretty sure you are wrong.

You could argue that the hairpin miRNAs are fundamentally different 
from the mature miRNAs (which I suppose they are), but that has nothing 
to do with normalization. For the normalization to be reasonable, you 
have to fulfill two criteria. First, most probes should not be 
differentially expressed between samples, and second, the underlying 
distributions of the data should not be completely dissimilar.

This has nothing to do with what the probes are supposed to measure, 
nor whether or not the probes are even measuring anything at all. So I 
don't see any real reason to separate the hairpin from mature miRNAs 
prior to normalizing.



> Also, I would like to use array quality metrics package on exprsData
> arrayQualityMetrics(expressionset = exprsData,
> outdir = "exprsData",force = FALSE, do.logtransform = FALSE)
> Thank you,
> Dana
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

James W. MacDonald, M.S.
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099

More information about the Bioconductor mailing list