[BioC] ReportingTools error

Sun Apr 14 10:38:33 CEST 2013

Dear Jason,

In my own use of limma, I never subset topTable output by rowname, so it 
isn't clear to me why ReportTools needs to do this.  To me, topTable 
already does the desired subsetting, so further subsetting should never be 
required.

More generally, I am unclear why ReportTools needs to operate on a 
MArrayLM object.  Why not operate on the topTable() output directly?

Regards
Gordon

---------------------------------------------
Professor Gordon K Smyth,
Bioinformatics Division,
Walter and Eliza Hall Institute of Medical Research,
1G Royal Parade, Parkville, Vic 3052, Australia.
http://www.statsci.org/smyth

On Fri, 12 Apr 2013, Jason Hackney wrote:

> Hi Jim,
>
> My concern about relying on the ID field is that it isn't always there. For
> instance, when I add featureData to an eSet, I almost always specify that
> the ID is a ProbeID for a microarray or a GeneID if I'm using some other
> identifier as my featureNames. When lmFit is called, the genes data.frame
> now doesn't have an ID column.
>
> What I might do is try to detect the ID column in either case, and use it
> if it's present.
>
> I expect that when/if topTableF and topTable are concordant in their
> row.names I'll know about because one of my unit tests will fail because
> they are expected to be discordant.
>
> Cheers,
>
> Jason
>
> On Fri, Apr 12, 2013 at 6:33 AM, James W. MacDonald <jmacdon at uw.edu> wrote:
>
>> Hi Jason,
>>
>> I see the same thing - I had an email exchange with Gordon back in
>> February and he agreed that the row.names of the output from topTable and
>> topTableF should be the same thing, and it looked like he was leaning
>> towards using the row numbers. Given the speed with which he updates things
>> in limma, I assumed this happened approximately 13 nanoseconds later, but
>> evidently it either fell through the cracks or he had a change of mind
>> (Gordon is cc'ed).
>>
>> But I wonder if the ID column is a better way to go anyway.
>>
>> Gordon - what is the safest way to use data from either topTable or
>> topTableF to extract the corresponding raw data from the input object? Is
>> the ID column guaranteed to always correspond to the row.names or
>> featureNames of the data passed into lmFit?
>>
>> Best,
>>
>> Jim
>>
>>> sessionInfo()
>> R version 3.0.0 (2013-04-03)
>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>
>> locale:
>>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>>  [7] LC_PAPER=C                 LC_NAME=C
>>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] parallel  stats     graphics  grDevices utils     datasets  methods
>> [8] base
>>
>> other attached packages:
>>  [1] ReportingTools_2.1.2      knitr_1.2
>>  [3] lattice_0.20-15           affycoretools_1.32.0
>>  [5] KEGG.db_2.9.0             GO.db_2.9.0
>>  [7] AnnotationDbi_1.22.1      affy_1.38.0
>>  [9] pd.ragene.1.0.st.v1_3.8.0 RSQLite_0.11.2
>> [11] DBI_0.2-5                 limma_3.16.1
>> [13] oligo_1.24.0              Biobase_2.20.0
>> [15] oligoClasses_1.22.0       BiocGenerics_0.6.0
>>
>> [snip]
>>
>>
>>
>>
>> On 4/11/2013 9:24 PM, Jason Hackney wrote:
>>
>>> Hi Jim,
>>>
>>> Could you send me your sessionInfo? I'm having trouble replicating this
>>> bug. I'm still getting probe names for topTableF and row numbers for
>>> topTable, as of limma_3.16.1. I'll pop in a bug fix to the ReportingTools
>>> trunk tomorrow, once I get the limma version sorted.
>>>
>>> Thanks,
>>>
>>> Jason
>>>
>>> On Thu, Apr 11, 2013 at 11:05 AM, James W. MacDonald <jmacdon at uw.edu<mailto:
>>> jmacdon at uw.edu>> wrote:
>>>
>>>     Hi,
>>>
>>>     I am getting an error when trying to create HTML pages with
>>>     ReportingTools, using an MArrayLM object as input. The error I get is
>>>
>>>     Error in expression.dat[probe, ] : subscript out of bounds
>>>
>>>     which appears to come from .make.gene.plots(), specifically here:
>>>
>>>      for (probe in rownames(df)) {
>>>             if ("Symbol" %in% colnames(df)) {
>>>                 ylab <- paste(df[probe, "Symbol"], ylab.type)
>>>             }
>>>             else {
>>>                 ylab <- paste(probe, ylab.type)
>>>             }
>>>             bigplot <- stripplot(expression.dat[**probe, ] ~ factor,
>>>
>>>     The problem being that the rownames for a topTable object will be
>>>     the row numbers of the MArrayLM object from whence the data came
>>>     (this was recently harmonized by Gordon Smyth, so the row.names
>>>     will always be the row number, regardless of using topTable() or
>>>     topTableF()).
>>>
>>>     In other words, it appears that probe is assumed to be the row
>>>     name, when in fact it will be the row number. So something like
>>>
>>>     for(probe in as.numeric(rownames(df))){
>>>
>>>     should do the trick.
>>>
>>>     Best,
>>>
>>>     Jim
>>>
>>>
>>>
>>>     --     James W. MacDonald, M.S.
>>>     Biostatistician
>>>     University of Washington
>>>     Environmental and Occupational Health Sciences
>>>     4225 Roosevelt Way NE, # 100
>>>     Seattle WA 98105-6099
>>>
>>>
>>>
>>>
>>> --
>>> Jason A. Hackney, Ph.D.
>>> Bioinformatics and Computational Biology
>>> Genentech
>>> hackney.jason at gene.com <mailto:hackney.jason at gene.com**>
>>> 650-467-5084
>>>
>>>
>>>
>> --
>> James W. MacDonald, M.S.
>> Biostatistician
>> University of Washington
>> Environmental and Occupational Health Sciences
>> 4225 Roosevelt Way NE, # 100
>> Seattle WA 98105-6099
>>
>>
>
>
> -- 
> Jason A. Hackney, Ph.D.
> Bioinformatics and Computational Biology
> Genentech
> hackney.jason at gene.com
> 650-467-5084
>

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}