[BioC] limma: strange subsetting fit[, i] of "MArrayLM" object's design

Wed May 14 02:04:18 CEST 2014

Dear Maciek,

The second index of an MArrayLM object subsets by coefficients.  You will 
easily see by typing dim(fit) or colnames(fit) and so on that the columns 
refer to coefficients.

You are right that the current subsetting of the design matrix is 
incorrect.  I've fixed this now in bioc-devel.

All the limma subsetting code has been rewritten in past 12 months and the 
treatment of the design matrix was overlooked.  In any case, subsetting a 
design matrix is not always a sensible thing to do, and the subsetted 
matrix is not normally used downstream.

Please upgrade to the latest version of Bioconductor.

Best wishes
Gordon

On Mon, 12 May 2014, Maciek Sykulski wrote:

> Dear Gordon, dear maintainers of limma package,
>
> While writing some code based on the R package limma, Bioconductor version:
> Release 2.13,
> I’ve noticed some strange code in topTable, and in subsetting fit[,i] of
> "MArrayLM" object (I’m not sure if this results in any bug).
>
> Assume a following execution:
>
>> fit <- lmFit(data,design,...)
>> topTable(eBayes(fit,robust=TRUE), coef=2:3,sort.by="F")
>
>
> specifically, in the function topTable, the line fit <- eBayes(fit[, coef])
> raises my concerns (see comment below in the code).
>
> topTable <-function(...) {
>
> [...]
>
> if (length(coef) > 1) {
>
>    coef <- unique(coef)
>
>    if (length(fit$coef[1, coef]) < ncol(fit))
>
>        fit <- eBayes(fit[, coef])
>
>        # ^ Above reevaluates eBayes without robust=TRUE
>
>        # even if we want a robust estimation
>
> [...]
>
> }
>
>
> Second confusion is about subsetting of fit object fit[, coef].
>
> fit[i,] subsets fit on genes (rows). Then, what kind of subsetting does
> fit[,coef] perform? Is it by samples, or by coefficients?
>
> After inspection I see that fit[,coef] subsets the fit$design matrix on
> samples, not on coefficients (while it does subset on columns from
> fit$coefficients).
> Inspection of the code responsible for subsetting "MArrayLM" reveals:
>
> subsetListOfArrays <- function(object,i,j,IJ,IX,I,JX)
> {
>    [...]
>        for(a in IJ) object[[a]] <- object[[a]][i,j,drop=FALSE]
>        for(a in IX) object[[a]] <- object[[a]][i, ,drop=FALSE]
>        for(a in I ) object[[a]] <- object[[a]][i]
>        for(a in JX) object[[a]] <- object[[a]][j, ,drop=FALSE]
>        #Shouldn`t this line above look more like this?:
>        #for(a in JX) object[[a]] <- object[[a]][, j,drop=FALSE]
>    [...]
> }
>
>
> Here is an example of a design matrix (rows correspond to samples, columns
> to fitted coefficients):
>
>> class(fit)
> [1] "MArrayLM"
> attr(,"package")
> [1] "limma"
>> fit$design
>    cluster.1 cluster.2 cluster.3 cluster.4
> 1           1         1         0         0
> 2           1         1         0         1
> 3           1         1         0         1
> [...]
> 42          1         1         0         0
> 43          1         1         0         0
> 44          1         0         0         0
> [...]
>
>
> Now, what I’d expect from an operation which subsets coefficients from a
> MArrayLM model
> (as in topTable line fit <- eBayes(fit[, coef])) is this
>
>    cluster.2 cluster.3
> 1           1         0
> 2           1         0
> 3           1         0
> [...]
>
>
> moreover fit[,2:3]$coefficients is modified exactly in this way.
>
> However, the resulting fit[,2:3]$design is a subset on samples:
>
>> fit[,2:3]$design
>    cluster.1 cluster.2 cluster.3 cluster.4
> 2           1         1         0         1
> 3           1         1         0         1
>
>
> Is this really what fit <- eBayes(fit[, coef]) inside topTable intends to
> achieve? This may be relevant, since subsetting fit this way propagates
> further inside eBayes function, where it causes classifyTestsF not to run
> (in case of my data), because of fit[,coef]$design being singular.
>
> eBayes <- function (fit, proportion = 0.01, stdev.coef.lim = c(0.1, 4),
>    trend = FALSE, robust = FALSE, winsor.tail.p = c(0.05, 0.1))
> {
>    [...]
>    #!!! Below condition always evaluates to false
>    # because fit$design was malformed before and
>    # is.fullrank always returns false
>    if (!is.null(fit$design) && is.fullrank(fit$design)) {
>        F.stat <- classifyTestsF(fit, fstat.only = TRUE)
>        fit$F <- as.vector(F.stat)
>   [...]
> }
>
>
> I’m not pointing to any specific misbehavior/bug, because it seems that
>
> topTable(eBayes(fit,robust=TRUE), coef=2:3,sort.by="F",number=N)
>
> still somehow produces the same results as my modified:
>
> topTable.fixed(eBayes.fixed(fit,robust=TRUE),coef=2:3,sort.by="F",robust=TRUE)
>
>
> and different from non-robust:
>
> topTable(eBayes(fit,robust=FALSE), coef=2:3,sort.by="F")
>
>
> So the final question is:
> Does fit[,coef] modifies the fit$design matrix properly, in alignment with
> later check is.fullrank(fit$design)?
>
>
> Attached is a simple example R session with these issues.
>
> Thanks,
> Maciek Sykulski
>

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:5}}