[BioC] ExpressionSet or MAList

Martin Morgan mtmorgan at fhcrc.org
Thu May 1 23:26:46 CEST 2008


Hi Daniel --

Daniel Brewer <daniel.brewer at icr.ac.uk> writes:

> Martin Morgan wrote:
>> Hi Daniel --
>> 
>> Daniel Brewer <daniel.brewer at icr.ac.uk> writes:
>> 
>>> Hi,
>>>
>>> I am starting to think about grouping a series of microarray datasets
>>> into bioconductor objects so that I can quickly look to see how a gene
>>> behaves in each dataset.  The two main options seem to be to use
>>> ExpressionSet or Limma's MAList.  Has anyone got an opinion on which
>>> would be best to use or the advantages and disadvantages of both.
>> 
>> Some biases on my part, but...
>> 
>> I guess either ExpressionSet or MAList is really meant to represent a
>> single 'experiment'. Sounds like you're going to create a collection
>> of experiments, so a collection of ExpressionSet or MAList objects (it
>> would be a mistake, I think, to jam all your experiments into a single
>> object of either of these classes).
>> 
>>> To my mind MAList stores the annotation with the dataset which I feel is
>> 
>> Storing annotations with the object can be a bad thing if the
>> annotations are the same, because then there are effectively different
>> variants of the same annotation, one for each object. These will
>> inevitably drift apart, leading to confusion. There is also a memory
>> use issue.
>> 
>> That said, annotations can be added to ExpressionSet, specifically
>> using featureData to store an AnnotatedDataFrame (data.frame +
>> annotation on column labels).
>> 
>>> an advantage whereas ExpressionSet is the base implementation for many
>>> libraries.
>> 
>> ExpressionSet is a little more tightly designed than MAList (MAList is
>> essentially a list and so can contain (or not contain) any data;
>> ExpressionSet is an S4 class that has to contain certain data. While
>> you lose on freedom with ExpressionSet, the constriction probably
>> comes with a benefit in terms of greater certainty about what the
>> object actually contains. This imposed uniformity likely has benefits
>> when the number of experiments you're managing increases. Many users
>> probably view their MAList / ExpressionSet as 'read-only', so for
>> these users the fact that you could do something to mess up an MAList
>> really is only an academic possibility (you can also do things to mess
>> up an ExpressionSet, again maybe just a bit harder to do that).
>> 
>> ExpressionSet also contains an experimentData slot, which would be an
>> ideal location to document which experiment the ExpressionSet
>> represents.
>> 
>>> Dan
>> 
>> hope that helps,
>> 
>> Martin
>> 
>
> Hi,
>
> I think you have sold me on the idea of ExpressionSet (mainly becuase of
> the MIAME stuff in ExperimentData), but I have one question about it.
> Is there anyway to store associated detection p-values/weights with it?

Well, maybe this will un-sell you ;) ExpressionSet is guaranteed to
have an 'exprs' matrix in its assayData. What you want to do is add
another, identically dimensioned, matrix, e.g.,
'weights'. ExpressionSet allows you to do that, though then you're
sort of back in the MAList realm of not being sure exactly what you
have. If you were creating an ExpressionSet from scratch and had an
'exprs' matrix and a 'weights' matrix you could do something like

> new("ExpressionSet", exprs=exprs, weights=weights)

and weights would end up in assayData. Things are a little more
complicated if you have an existing ExpressionSet that you want to add
a matrix to. The basic steps are

> storageMode(obj)
[1] "lockedEnvironment"
> storageMode(obj) = "environment"
> assayData(obj)[["weights"]] = weights
> storageMode(obj) = "lockedEnvironment"
> validObject(obj, complete=TRUE)

First, by default ExpressionSet stores its 'big' data in a special
container called a 'lockedEnvironment'. This container can't normally
be modified, and so we change it's storage mode to a modifiable form
(this actually makes a copy of the underyling environment; we could
also have changed the storage mode to 'list', and then assayData would
behave like a list). We then add our data, and lock the environment
again (locking is important). Finally we check that the object we've
just created conforms to ExpressionSet expectations (e.g., that the
matrix we've added has the right dimensions and dimnames).

Once 'weights' is in assayData, subssetting the expression set,
accessing the assayData elements (e.g., assayData(obj)[["weights"]]),
etc should all work as expected.

The 'convert' package has this coercion method defined

setAs("MAList", "ExpressionSet", function(from)
{
    nM <- new("MIAME")
    notes(nM) <- list("Converted from MAList object, exprs are M-values")
    new("ExpressionSet", exprs = as.matrix(from$M),
        phenoData = new("AnnotatedDataFrame", data=from$targets),
        experimentData = nM)
})

and you might consider writing your own version that customizes which
infomration is moved from MAList to ExpressionSet.

Hope that helps,

Martin

>  This would be useful information to retain for later analysis.
>
> Dan
>
> -- 
> **************************************************************
> Daniel Brewer, Ph.D.
>
> Institute of Cancer Research
> Email: daniel.brewer at icr.ac.uk
> **************************************************************
>
> The Institute of Cancer Research: Royal Cancer Hospital, a charitable Company Limited by Guarantee, Registered in England under Company No. 534147 with its Registered Office at 123 Old Brompton Road, London SW7 3RP.
>
> This e-mail message is confidential and for use by the...{{dropped:11}}



More information about the Bioconductor mailing list