[BioC] error preProcessGeneST ArrayTools (R 2.15.2, BioC 2.11)

James W. MacDonald jmacdon at uw.edu
Thu Dec 20 23:18:38 CET 2012


Hi Jose,

On 12/20/2012 4:55 PM, José López wrote:
> Hi Jim,
>
> Thank you for your quick reply and your useful explanation.
> In fact, I was using the function to subset the eset since I didn’t 
> know other way to do it (before).
>
> By the way, I would like to know your opinion about whether is a good 
> option to filter controls before doing moderate t-statistics (limma) 
> or should I rather perform statistic on the whole variable dataset 
> (including controls)?

My understanding is that you should filter after fitting the model when 
using limma, because you will bias the empirical Bayes estimate if you 
filter first. So that is what I do with these arrays.

>
> I have read in that in contrast to standard t-statistic, common 
> filter/test pairs does not necessarily translate in power gains when 
> moderated t-statistics is performed (Bourgon, Gentleman and Huber, 
> 2009), so following their indications, I don’t use to apply any filter 
> on genes. The question is whether you think it is a good practice to 
> remove control probes (not genes) or not.

I filter the control probes, mainly because they have a bad habit of 
popping up in a list of differentially expressed genes. This was much 
rarer with the 3' biased arrays, and way more obvious since those 
controls had a big AFFX appended to the probeset ID.

What I generally see is that the intronic controls often appear to be 
differentially expressed. I can come up with several hypotheses as to 
why this is so (the primary one being that total mRNA will likely also 
include mRNA that hasn't yet been processed to remove the introns, so if 
one sample is more actively expressing a gene, you may well end up with 
introns being processed into cDNA and then hybridized to the chip).

Regardless, these are supposed to be controls, and are not really 
annotated, and it is hard to explain when they pop up in lists of 
differentially expressed genes. So I take the easy way out and nuke them 
right after fitting the model.

Best,

Jim




>
> Sorry if the question is not clearly exposed.
>
> Thank you in advance for your time and your help.
>
> Best,
>
> Jose
>
> El dic 20, 2012, a las 7:00 p.m., James W. MacDonald escribió:
>
>> Hi Jose,
>>
>> On 12/20/2012 12:20 PM, José LÓPEZ wrote:
>>> Dear all,
>>>
>>> I was trying to use preProcessGeneST ArrayTools to get rid of 
>>> control probes in Mouse Gene 1.0ST arrays, but it dosent work in 
>>> last R/BioC version.
>>> It was working perfectly in previous R/BioC version. Do I downgrade 
>>> to previous version to continue to use the ArrayTools package?
>>> It is possible that the error has a different cause?
>>
>> This looks like a bug in the current version of the 
>> mogene10sttranscriptcluster.db package, as the MAP object appears to 
>> be missing:
>>
>> > ls(2)
>> [1] "mogene10sttranscriptcluster"
>> <snip>
>> [20] "mogene10sttranscriptclusterGO2ALLPROBES"
>> [21] "mogene10sttranscriptclusterGO2PROBE"
>> [22] "mogene10sttranscriptclusterMAPCOUNTS"
>> [23] "mogene10sttranscriptclusterMGI"
>> <snip>
>>
>> So I don't think downgrading anything will help - we just need to 
>> rebuild this package.
>>
>> But this brings me to a different question. The function you are 
>> using is intended to annotate things and then output in the current 
>> directory, and removing control probes is just a side effect of one 
>> argument. So are you trying to annotate, or to remove control probes?
>>
>> If you just want to remove control probes, note that you can do
>>
>> > data(mogene10stCONTROL)
>>
>> and then you can subset your eset using the resulting data.frame:
>>
>> eset_no_control <- eset_norm[!featureNames(eset_norm) %in% 
>> mogene10stCONTROL$probeset_id,]
>>
>> Note the use of the bang (!) preceding featureNames - we want to 
>> remove these things, not select for them.
>>
>> Best,
>>
>> Jim
>>
>>
>>>
>>> Thank you for your kind help,
>>>
>>>> eset_process = preProcessGeneST(eset_norm, output = TRUE)
>>> Warning message:
>>> In chkPkgs(chip) :
>>> The mogene10sttranscriptcluster.db package does not appear to 
>>> contain annotation data.
>>> Error in function (x, envir, mode = "any", ifnotfound = 
>>> list(function(x) stop(paste0("value for '", :
>>> error in evaluating the argument 'envir' in selecting a method for 
>>> function 'mget': Error: object 'mogene10sttranscriptclusterMAP' not 
>>> found
>>>> sessionInfo()
>>> R version 2.15.2 (2012-10-26)
>>> Platform: i386-apple-darwin9.8.0/i386 (32-bit)
>>>
>>> locale:
>>> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>>>
>>> attached base packages:
>>> [1] stats graphics grDevices utils datasets methods base
>>>
>>> other attached packages:
>>> [1] mogene10stv1cdf_2.11.0 annaffy_1.30.0 KEGG.db_2.8.0
>>> [4] GO.db_2.8.0 arrayQualityMetrics_3.14.0 ArrayTools_1.18.0
>>> [7] mogene10sttranscriptcluster.db_8.0.1 org.Mm.eg.db_2.8.0 
>>> RSQLite_0.11.2
>>> [10] DBI_0.2-5 affy_1.36.0 annotate_1.36.0
>>> [13] AnnotationDbi_1.20.3 vsn_3.26.0 Biobase_2.18.0
>>> [16] BiocGenerics_0.4.0 limma_3.14.3
>>>
>>> loaded via a namespace (and not attached):
>>> [1] affyio_1.26.0 affyPLM_1.34.0 beadarray_2.8.1 
>>> BeadDataPackR_1.10.0 BiocInstaller_1.8.3
>>> [6] Biostrings_2.26.2 Cairo_1.5-2 cluster_1.14.3 colorspace_1.2-0 
>>> gcrma_2.30.0
>>> [11] genefilter_1.40.0 grid_2.15.2 Hmisc_3.10-1 hwriter_1.3 
>>> IRanges_1.16.4
>>> [16] lattice_0.20-10 latticeExtra_0.6-24 parallel_2.15.2 plyr_1.8 
>>> preprocessCore_1.20.0
>>> [21] RColorBrewer_1.0-5 reshape2_1.2.2 setRNG_2011.11-2 
>>> splines_2.15.2 stats4_2.15.2
>>> [26] stringr_0.6.2 survival_2.37-2 SVGAnnotation_0.93-1 tools_2.15.2 
>>> XML_3.95-0.1
>>> [31] xtable_1.7-0 zlibbioc_1.4.0
>>>> class(H2Bgfp_norm)
>>> [1] "ExpressionSet"
>>> attr(,"package")
>>> [1] "Biobase"
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives: 
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>> -- 
>> James W. MacDonald, M.S.
>> Biostatistician
>> University of Washington
>> Environmental and Occupational Health Sciences
>> 4225 Roosevelt Way NE, # 100
>> Seattle WA 98105-6099
>>
>

-- 
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099



More information about the Bioconductor mailing list