[BioC] limma: strange subsetting fit[, i] of MArrayLM object design

Gordon K Smyth smyth at wehi.EDU.AU
Wed May 14 09:36:46 CEST 2014


Let me clarify that further.

fit[,j] should not subset the design matrix at all.  The effect of fit[,j] 
is actually the same as:

   contrasts.fit(fit, contrast=u)

where u[j]=1 and u is otherwise zero.  The design matrix is kept intact, 
but the subsetted object keeps track of which contrasts are represented in 
the coefficients matrix by means of the contrasts matrix.

For example, the following code will reproduce in limma 3.19.10 and 
earlier or limma 3.21.2 and later:

   > example(lmFit)
   > fit[,2]$design
      Grp1 Grp2vs1
   [1,]    1       0
   [2,]    1       0
   [3,]    1       0
   [4,]    1       1
   [5,]    1       1
   [6,]    1       1
   > fit[,2]$contrasts
              Contrast
   Coefficient Grp2vs1
       Grp1          0
       Grp2vs1       1

Best wishes
Gordon

On Wed, 14 May 2014, Gordon K Smyth wrote:

> Dear Maciek,
>
> The second index of an MArrayLM object subsets by coefficients.  You 
> will easily see by typing dim(fit) or colnames(fit) and so on that the 
> columns refer to coefficients.
>
> You are right that the current subsetting of the design matrix is 
> incorrect. I've fixed this now in bioc-devel.
>
> All the limma subsetting code has been rewritten in past 12 months and 
> the treatment of the design matrix was overlooked.  In any case, 
> subsetting a design matrix is not always a sensible thing to do, and the 
> subsetted matrix is not normally used downstream.
>
> Please upgrade to the latest version of Bioconductor.
>
> Best wishes
> Gordon
>
>
>
> On Mon, 12 May 2014, Maciek Sykulski wrote:
>
>> Dear Gordon, dear maintainers of limma package,
>> 
>> While writing some code based on the R package limma, Bioconductor version:
>> Release 2.13,
>> I’ve noticed some strange code in topTable, and in subsetting fit[,i] of
>> "MArrayLM" object (I’m not sure if this results in any bug).
>> 
>> Assume a following execution:
>> 
>>> fit <- lmFit(data,design,...)
>>> topTable(eBayes(fit,robust=TRUE), coef=2:3,sort.by="F")
>> 
>> 
>> specifically, in the function topTable, the line fit <- eBayes(fit[, coef])
>> raises my concerns (see comment below in the code).
>> 
>> topTable <-function(...) {
>> 
>> [...]
>> 
>> if (length(coef) > 1) {
>>
>>    coef <- unique(coef)
>>
>>    if (length(fit$coef[1, coef]) < ncol(fit))
>>
>>        fit <- eBayes(fit[, coef])
>>
>>        # ^ Above reevaluates eBayes without robust=TRUE
>>
>>        # even if we want a robust estimation
>> 
>> [...]
>> 
>> }
>> 
>> 
>> Second confusion is about subsetting of fit object fit[, coef].
>> 
>> fit[i,] subsets fit on genes (rows). Then, what kind of subsetting does
>> fit[,coef] perform? Is it by samples, or by coefficients?
>> 
>> After inspection I see that fit[,coef] subsets the fit$design matrix on
>> samples, not on coefficients (while it does subset on columns from
>> fit$coefficients).
>> Inspection of the code responsible for subsetting "MArrayLM" reveals:
>> 
>> subsetListOfArrays <- function(object,i,j,IJ,IX,I,JX)
>> {
>>    [...]
>>        for(a in IJ) object[[a]] <- object[[a]][i,j,drop=FALSE]
>>        for(a in IX) object[[a]] <- object[[a]][i, ,drop=FALSE]
>>        for(a in I ) object[[a]] <- object[[a]][i]
>>        for(a in JX) object[[a]] <- object[[a]][j, ,drop=FALSE]
>>        #Shouldn`t this line above look more like this?:
>>        #for(a in JX) object[[a]] <- object[[a]][, j,drop=FALSE]
>>    [...]
>> }
>> 
>> 
>> Here is an example of a design matrix (rows correspond to samples, columns
>> to fitted coefficients):
>> 
>>> class(fit)
>> [1] "MArrayLM"
>> attr(,"package")
>> [1] "limma"
>>> fit$design
>>    cluster.1 cluster.2 cluster.3 cluster.4
>> 1           1         1         0         0
>> 2           1         1         0         1
>> 3           1         1         0         1
>> [...]
>> 42          1         1         0         0
>> 43          1         1         0         0
>> 44          1         0         0         0
>> [...]
>> 
>> 
>> Now, what I’d expect from an operation which subsets coefficients from a
>> MArrayLM model
>> (as in topTable line fit <- eBayes(fit[, coef])) is this
>>
>>    cluster.2 cluster.3
>> 1           1         0
>> 2           1         0
>> 3           1         0
>> [...]
>> 
>> 
>> moreover fit[,2:3]$coefficients is modified exactly in this way.
>> 
>> However, the resulting fit[,2:3]$design is a subset on samples:
>> 
>>> fit[,2:3]$design
>>    cluster.1 cluster.2 cluster.3 cluster.4
>> 2           1         1         0         1
>> 3           1         1         0         1
>> 
>> 
>> Is this really what fit <- eBayes(fit[, coef]) inside topTable intends to
>> achieve? This may be relevant, since subsetting fit this way propagates
>> further inside eBayes function, where it causes classifyTestsF not to run
>> (in case of my data), because of fit[,coef]$design being singular.
>> 
>> eBayes <- function (fit, proportion = 0.01, stdev.coef.lim = c(0.1, 4),
>>    trend = FALSE, robust = FALSE, winsor.tail.p = c(0.05, 0.1))
>> {
>>    [...]
>>    #!!! Below condition always evaluates to false
>>    # because fit$design was malformed before and
>>    # is.fullrank always returns false
>>    if (!is.null(fit$design) && is.fullrank(fit$design)) {
>>        F.stat <- classifyTestsF(fit, fstat.only = TRUE)
>>        fit$F <- as.vector(F.stat)
>>   [...]
>> }
>> 
>> 
>> I’m not pointing to any specific misbehavior/bug, because it seems that
>> 
>> topTable(eBayes(fit,robust=TRUE), coef=2:3,sort.by="F",number=N)
>> 
>> still somehow produces the same results as my modified:
>> 
>> topTable.fixed(eBayes.fixed(fit,robust=TRUE),coef=2:3,sort.by="F",robust=TRUE)
>> 
>> 
>> and different from non-robust:
>> 
>> topTable(eBayes(fit,robust=FALSE), coef=2:3,sort.by="F")
>> 
>> 
>> So the final question is:
>> Does fit[,coef] modifies the fit$design matrix properly, in alignment with
>> later check is.fullrank(fit$design)?
>> 
>> 
>> Attached is a simple example R session with these issues.
>> 
>> Thanks,
>> Maciek Sykulski
>

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:5}}


More information about the Bioconductor mailing list