[BioC] limma: strange subsetting fit[, i] of "MArrayLM" object`s design matrix, also robust=TRUE and topTable

Maciek Sykulski macieksk at mimuw.edu.pl
Mon May 12 20:31:08 CEST 2014


Dear Gordon, dear maintainers of limma package,

While writing some code based on the R package limma, Bioconductor version:
Release 2.13,
I’ve noticed some strange code in topTable, and in subsetting fit[,i] of
"MArrayLM" object (I’m not sure if this results in any bug).

Assume a following execution:

> fit <- lmFit(data,design,...)
> topTable(eBayes(fit,robust=TRUE), coef=2:3,sort.by="F")


specifically, in the function topTable, the line fit <- eBayes(fit[, coef])
raises my concerns (see comment below in the code).

topTable <-function(...) {

[...]

if (length(coef) > 1) {

    coef <- unique(coef)

    if (length(fit$coef[1, coef]) < ncol(fit))

        fit <- eBayes(fit[, coef])

        # ^ Above reevaluates eBayes without robust=TRUE

        # even if we want a robust estimation

[...]

}


Second confusion is about subsetting of fit object fit[, coef].

fit[i,] subsets fit on genes (rows). Then, what kind of subsetting does
fit[,coef] perform? Is it by samples, or by coefficients?

After inspection I see that fit[,coef] subsets the fit$design matrix on
samples, not on coefficients (while it does subset on columns from
fit$coefficients).
Inspection of the code responsible for subsetting "MArrayLM" reveals:

subsetListOfArrays <- function(object,i,j,IJ,IX,I,JX)
{
    [...]
        for(a in IJ) object[[a]] <- object[[a]][i,j,drop=FALSE]
        for(a in IX) object[[a]] <- object[[a]][i, ,drop=FALSE]
        for(a in I ) object[[a]] <- object[[a]][i]
        for(a in JX) object[[a]] <- object[[a]][j, ,drop=FALSE]
        #Shouldn`t this line above look more like this?:
        #for(a in JX) object[[a]] <- object[[a]][, j,drop=FALSE]
    [...]
}


Here is an example of a design matrix (rows correspond to samples, columns
to fitted coefficients):

> class(fit)
[1] "MArrayLM"
attr(,"package")
[1] "limma"
> fit$design
    cluster.1 cluster.2 cluster.3 cluster.4
1           1         1         0         0
2           1         1         0         1
3           1         1         0         1
[...]
42          1         1         0         0
43          1         1         0         0
44          1         0         0         0
[...]


Now, what I’d expect from an operation which subsets coefficients from a
MArrayLM model
(as in topTable line fit <- eBayes(fit[, coef])) is this

    cluster.2 cluster.3
1           1         0
2           1         0
3           1         0
[...]


moreover fit[,2:3]$coefficients is modified exactly in this way.

However, the resulting fit[,2:3]$design is a subset on samples:

> fit[,2:3]$design
    cluster.1 cluster.2 cluster.3 cluster.4
2           1         1         0         1
3           1         1         0         1


Is this really what fit <- eBayes(fit[, coef]) inside topTable intends to
achieve? This may be relevant, since subsetting fit this way propagates
further inside eBayes function, where it causes classifyTestsF not to run
(in case of my data), because of fit[,coef]$design being singular.

eBayes <- function (fit, proportion = 0.01, stdev.coef.lim = c(0.1, 4),
    trend = FALSE, robust = FALSE, winsor.tail.p = c(0.05, 0.1))
{
    [...]
    #!!! Below condition always evaluates to false
    # because fit$design was malformed before and
    # is.fullrank always returns false
    if (!is.null(fit$design) && is.fullrank(fit$design)) {
        F.stat <- classifyTestsF(fit, fstat.only = TRUE)
        fit$F <- as.vector(F.stat)
   [...]
}


I’m not pointing to any specific misbehavior/bug, because it seems that

topTable(eBayes(fit,robust=TRUE), coef=2:3,sort.by="F",number=N)

still somehow produces the same results as my modified:

topTable.fixed(eBayes.fixed(fit,robust=TRUE),coef=2:3,sort.by="F",robust=TRUE)


and different from non-robust:

topTable(eBayes(fit,robust=FALSE), coef=2:3,sort.by="F")


So the final question is:
Does fit[,coef] modifies the fit$design matrix properly, in alignment with
later check is.fullrank(fit$design)?


Attached is a simple example R session with these issues.

Thanks,
Maciek Sykulski
-------------- next part --------------
A non-text attachment was scrubbed...
Name: limma_debug_session.pdf
Type: application/pdf
Size: 75559 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/bioconductor/attachments/20140512/2762dc29/attachment.pdf>


More information about the Bioconductor mailing list