[BioC] Problem with topTable function (after Bioconductor update)

Gordon K Smyth smyth at wehi.EDU.AU
Mon Nov 4 08:12:10 CET 2013


Dear James and Christian,

There was a bug in topTableF().  Now fixed.

Thanks for pointing it out.

Best wishes
Gordon


> Date: Wed, 30 Oct 2013 14:08:29 -0400
> From: "James W. MacDonald" <jmacdon at uw.edu>
> To: Christian De Santis <christian.desantis at stir.ac.uk>
> Cc: "bioconductor at r-project.org" <bioconductor at r-project.org>
> Subject: Re: [BioC] Problem with topTable function (after Bioconductor
> 	update)
>
> Hi Christian,
>
> On Wednesday, October 30, 2013 12:23:46 PM, Christian De Santis wrote:
>> Hi to everybody,
>>
>> I hope someone can help me on this as I am losing my health.
>
> Not sure if you were intending humor, but this cracked me up!
>
>>
>> I was re running my script today which worked fine till last time I did 
>> it. When I run the topTable function on more than one coefficient it 
>> gives me the following error.
>>
>>> x.contrast.REG <- topTable(fit.contrast, adjust.method="none", coef=c(1:4), number=20000, p.value = 0.05)
>> Error in `row.names<-.data.frame`(`*tmp*`, value = value) :
>>    duplicate 'row.names' are not allowed
>> In addition: Warning message:
>> non-unique values when setting 'row.names':
>>
>> I have tried to run the same script for each coefficient separately and 
>> it works fine. I can't explain the error since if there were duplicated 
>> row.names then I should get the same error when doing the analysis on 
>> individual contrasts since I run topTable on the fit object.
>>
>> The only thing I have changed was the fact that this morning I updated 
>> Bioconductor to the latest release.
>>
>> I really hope someone can help me understand.
>
> There appears to be a bug in the function topTableF.
>
> When you run topTable() on multiple coefficients, it actually uses
> topTableF(). In that function there is a bit of code where the data are
> subsetted according to the p.value and lfc values you submitted:
>
> if (lfc > 0 || p.value < 1) {
>        if (lfc > 0)
>            big <- rowSums(abs(M) > lfc, na.rm = TRUE) > 0
>        else big <- TRUE
>        if (p.value < 1) {
>            sig <- adj.P.Value <= p.value
>            sig[is.na(sig)] <- FALSE
>        }
>        else sig <- TRUE
>        keep <- big & sig
>        if (!all(keep)) {
>            M <- M[keep, , drop = FALSE]
>            rn <- rn[keep]
>            Amean <- Amean[keep]
>            Fstat <- Fstat[keep]
>            Fp <- p.value[keep]
>            genelist <- genelist[keep, , drop = FALSE]
>            adj.P.Value <- adj.P.Value[keep]
>        }
>
> So now all the data have been subsetted (in your case) to just those
> genes with an unadjusted p-value < 0.05. And presumably that is fewer
> than the 20000 genes that you started with. But then there is some
> sorting of the resulting data:
>
> if (sort.by == "F")
>        o <- order(Fp, decreasing = FALSE)[1:number]
>    else o <- 1:number
>    if (is.null(genelist))
>        tab <- data.frame(M[o, , drop = FALSE])
>    else tab <- data.frame(genelist[o, , drop = FALSE], M[o,
>        , drop = FALSE])
>    tab$AveExpr = fit$Amean[o]
>    tab <- data.frame(tab, F = Fstat[o], P.Value = Fp[o], adj.P.Val =
> adj.P.Value[o])
>    rownames(tab) <- rn[o]
>
> The problem here is that 1:number is _longer_ than Fp, which has been
> subsetted already, based on the p-value criterion. So the vector 'o'
> will have a bunch of NAs on the end. As an example:
>
> (1:5)[1:8]
> [1]  1  2  3  4  5 NA NA NA
>
> And when you get to the last line there, where the rownames are set,
> since there are multiple NAs, you will have repeated rownames, which R
> won't allow.
>
> In the interest of your health, you can get around this for now by doing
>
> num <- sum(topTable(fit.contrast, adjust.method="none", coef=c(1:4),
> number=20000)$P.value < 0.05)
>
> and then
>
> x.contrast.REG <- topTable(fit.contrast, adjust.method="none",
> coef=c(1:4), number=num, p.value = 0.05)
>
> Best,
>
> Jim
>
>
>>
>> Regards,
>> Christian De Santis
>>
>>> sessionInfo()
>> R version 3.0.1 (2013-05-16)
>> Platform: i386-w64-mingw32/i386 (32-bit)
>>
>> locale:
>> [1] LC_COLLATE=English_United Kingdom.1252  LC_CTYPE=English_United Kingdom.1252    LC_MONETARY=English_United Kingdom.1252
>> [4] LC_NUMERIC=C                            LC_TIME=English_United Kingdom.1252
>>
>> attached base packages:
>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>
>> other attached packages:
>> [1] reshape2_1.2.2       ellipse_0.3-8        genefilter_1.44.0    limma_3.18.1         BiocInstaller_1.12.0
>>
>> loaded via a namespace (and not attached):
>>   [1] annotate_1.40.0      AnnotationDbi_1.24.0 Biobase_2.22.0       BiocGenerics_0.8.0   DBI_0.2-7
>>   [6] IRanges_1.20.3       parallel_3.0.1       plyr_1.8             RSQLite_0.11.4       splines_3.0.1
>> [11] stats4_3.0.1         stringr_0.6.2        survival_2.37-4      tools_3.0.1          XML_3.98-1.1
>> [16] xtable_1.7-1
>>
>>
>>
>>
>
> --
> James W. MacDonald, M.S.
> Biostatistician
> University of Washington
> Environmental and Occupational Health Sciences
> 4225 Roosevelt Way NE, # 100
> Seattle WA 98105-6099
>

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}



More information about the Bioconductor mailing list