[BioC] Problem with topTable function (after Bioconductor update)

Christian De Santis christian.desantis at stir.ac.uk
Wed Oct 30 19:37:47 CET 2013

Hi Jim,

Thank for the detailed explanation and yes, there was some humour together with some truth! :-)
It really drove me crazy as it was working up until yesterday. That's why I thought it must have come with the Bioconductor update.

I tried what you said and it worked just fine!!! Just a note for someone else might have the same problem and read this post, there was a misspelling in your code (P.Value instead of P.value).

Thanks so much for taking the time to answer and rescue me. I really appreciated it!


-----Original Message-----
From: James W. MacDonald [mailto:jmacdon at uw.edu] 
Sent: 30 October 2013 18:08
To: Christian De Santis
Cc: bioconductor at r-project.org
Subject: Re: [BioC] Problem with topTable function (after Bioconductor update)

Hi Christian,

On Wednesday, October 30, 2013 12:23:46 PM, Christian De Santis wrote:
> Hi to everybody,
> I hope someone can help me on this as I am losing my health.

Not sure if you were intending humor, but this cracked me up!

> I was re running my script today which worked fine till last time I did it. When I run the topTable function on more than one coefficient it gives me the following error.
>> x.contrast.REG <- topTable(fit.contrast, adjust.method="none", 
>> coef=c(1:4), number=20000, p.value = 0.05)
> Error in `row.names<-.data.frame`(`*tmp*`, value = value) :
>    duplicate 'row.names' are not allowed In addition: Warning message:
> non-unique values when setting 'row.names':
> I have tried to run the same script for each coefficient separately and it works fine. I can't explain the error since if there were duplicated row.names then I should get the same error when doing the analysis on individual contrasts since I run topTable on the fit object.
> The only thing I have changed was the fact that this morning I updated Bioconductor to the latest release.
> I really hope someone can help me understand.

There appears to be a bug in the function topTableF.

When you run topTable() on multiple coefficients, it actually uses topTableF(). In that function there is a bit of code where the data are subsetted according to the p.value and lfc values you submitted:

 if (lfc > 0 || p.value < 1) {
        if (lfc > 0)
            big <- rowSums(abs(M) > lfc, na.rm = TRUE) > 0
        else big <- TRUE
        if (p.value < 1) {
            sig <- adj.P.Value <= p.value
            sig[is.na(sig)] <- FALSE
        else sig <- TRUE
        keep <- big & sig
        if (!all(keep)) {
            M <- M[keep, , drop = FALSE]
            rn <- rn[keep]
            Amean <- Amean[keep]
            Fstat <- Fstat[keep]
            Fp <- p.value[keep]
            genelist <- genelist[keep, , drop = FALSE]
            adj.P.Value <- adj.P.Value[keep]

So now all the data have been subsetted (in your case) to just those genes with an unadjusted p-value < 0.05. And presumably that is fewer than the 20000 genes that you started with. But then there is some sorting of the resulting data:

if (sort.by == "F")
        o <- order(Fp, decreasing = FALSE)[1:number]
    else o <- 1:number
    if (is.null(genelist))
        tab <- data.frame(M[o, , drop = FALSE])
    else tab <- data.frame(genelist[o, , drop = FALSE], M[o,
        , drop = FALSE])
    tab$AveExpr = fit$Amean[o]
    tab <- data.frame(tab, F = Fstat[o], P.Value = Fp[o], adj.P.Val =
    rownames(tab) <- rn[o]

The problem here is that 1:number is _longer_ than Fp, which has been subsetted already, based on the p-value criterion. So the vector 'o' 
will have a bunch of NAs on the end. As an example:

[1]  1  2  3  4  5 NA NA NA

And when you get to the last line there, where the rownames are set, since there are multiple NAs, you will have repeated rownames, which R won't allow.

In the interest of your health, you can get around this for now by doing

num <- sum(topTable(fit.contrast, adjust.method="none", coef=c(1:4), number=20000)$P.value < 0.05)

and then

x.contrast.REG <- topTable(fit.contrast, adjust.method="none", coef=c(1:4), number=num, p.value = 0.05)



> Regards,
> Christian De Santis
>> sessionInfo()
> R version 3.0.1 (2013-05-16)
> Platform: i386-w64-mingw32/i386 (32-bit)
> locale:
> [1] LC_COLLATE=English_United Kingdom.1252  LC_CTYPE=English_United Kingdom.1252    LC_MONETARY=English_United Kingdom.1252
> [4] LC_NUMERIC=C                            LC_TIME=English_United Kingdom.1252
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
> other attached packages:
> [1] reshape2_1.2.2       ellipse_0.3-8        genefilter_1.44.0    limma_3.18.1         BiocInstaller_1.12.0
> loaded via a namespace (and not attached):
>   [1] annotate_1.40.0      AnnotationDbi_1.24.0 Biobase_2.22.0       BiocGenerics_0.8.0   DBI_0.2-7
>   [6] IRanges_1.20.3       parallel_3.0.1       plyr_1.8             RSQLite_0.11.4       splines_3.0.1
> [11] stats4_3.0.1         stringr_0.6.2        survival_2.37-4      tools_3.0.1          XML_3.98-1.1
> [16] xtable_1.7-1

James W. MacDonald, M.S.
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099

The University of Stirling has been ranked in the top 12 of UK universities for graduate employment*.
94% of our 2012 graduates were in work and/or further study within six months of graduation.
*The Telegraph
The University of Stirling is a charity registered in Scotland, number SC 011159.

More information about the Bioconductor mailing list