[BioC] package pair "hugene10stv1cdf"/"hugene10stprobeset.db"

Laurent Gautier laurent at cbs.dtu.dk
Tue May 4 00:00:55 CEST 2010


Hi Marc,

What I am reading translates into very little confidence in anything 
related to hugene 1.0ST in the bioconductor "affy" pipeline, and I 
really think that it should be more difficult to use it without going 
through steps that require one to explicitly see that this is 
untested/not recommended/unsafe. The CDF seems to be of uncertain 
quality to all, yet provided by bioconductor, and a warning message / 
recommendation to switch to oligo when attaching the package would be 
helpful, I think.

Best,


Laurent



On 5/3/10 7:07 PM, Marc Carlson wrote:
> Hi Laurent,
>
> Further complicating things, the hugene10stprobeset.db package was a
> contributed package.  From the DESCRIPTION file you can see that it was
> contributed by Arthur Li.  You might want to ask him for more details
> about this package and also about the hugene10sttranscriptcluster.db
> package.  Because I note that for the hugene10sttranscriptcluster.db
> package I get the following:
>
>
> summary(Lkeys(hugene10sttranscriptclusterSYMBOL) %in% ls(hugene10stv1cdf))
>
>     Mode   FALSE    TRUE    NA's
>     logical     962   32295       0
>
> summary(ls(hugene10stv1cdf) %in% Lkeys(hugene10sttranscriptclusterSYMBOL))
>
>     Mode   FALSE    TRUE    NA's
>     logical      26   32295       0
>
>
> And this looks like a closer match for what you are doing (considering
> that we don't have a properly supported cdf file in this case).
>
> Hope this helps,
>
>
>    Marc
>
>
>
> On 05/03/2010 09:28 AM, Laurent Gautier wrote:
>    
>> Hi James,
>>
>> Thanks for the clarifications. I am happy to see that Affymetrix has
>> picked up the concept of alternative CDF definitions and makes it
>> easier for its users.
>>
>> Regarding bioconductor, wouldn't it make sense to either mark packages
>> as "unsupported", or better take them to a different location, making
>> their download by the unaware less likely. In the present case should
>> the CDF be placed outside of the main repository ?
>>
>> In addition, wouldn't it make sense to coordinate the release the
>> release of probe/probeset mapping structures and annotation files (I
>> am reading below that there annotation for revision 5 while the
>> mapping is for revision 4) ?
>> What about making the revision number a documented _non-exported_
>> vector in the packages ?
>> This way one could do for example:
>>      
>>> hugene10stprobeset:::revision
>>>        
>> [1] "r5"
>> (keeping the vector non-exported circumvents the issue of a scope
>> pollution whenever different packages with a variable "revision" are
>> in the search path).
>>
>> Best,
>>
>>
>> Laurent
>>
>>
>>
>> On 03/05/10 17:05, James W. MacDonald wrote:
>>      
>>> Hi Laurent,
>>>
>>> Laurent Gautier wrote:
>>>        
>>>> Dear List,
>>>>
>>>> I am noting potential issues in the package pair
>>>> "hugene10stv1cdf"/"hugene10stprobeset.db", as the respective sets of
>>>> probe set IDs are not overlapping:
>>>>
>>>>          
>>>>> library(hugene10stv1cdf)
>>>>> library(hugene10stprobeset.db)
>>>>> summary(ls(hugene10stv1cdf) %in% Lkeys(hugene10stprobesetSYMBOL))
>>>>>            
>>>>     Mode   FALSE    TRUE    NA's
>>>> logical   28026    4295       0
>>>>          
>>>>> summary(Lkeys(hugene10stprobesetSYMBOL) %in% ls(hugene10stv1cdf))
>>>>>            
>>>>     Mode   FALSE    TRUE    NA's
>>>> logical  252727    4295       0
>>>>
>>>> Reading closely, one can observe that "hugene10stprobeset.db" refers
>>>> to a "revision 5" while the "v1" in "hugene10stv1cdf" suggests a
>>>> revision 1. It is unclear to me whether this is linked to the
>>>> problem, but if so then there is no hugene10stv5cdf, neither
>>>> annotation for v1.
>>>>          
>>> It's hard to say what the 'revision 5' refers to. There is only one
>>> HuGene chip, and it is the version 1. There _have_ been nine versions
>>> of the annotation file released by Affy (Releases 22-30), so there is
>>> no telling what 'revision 5' refers to. But certainly it doesn't
>>> refer to a HuGene-1_0-st-v5 chip, as no such thing exists.
>>>
>>> I have a personal thesis that the Exon and Gene chips contain all
>>> manner of extra sequences that Affy threw on there so they wouldn't
>>> have the same problem they had with their 3'-biased chips. Namely
>>> that the chips were out-of-date the minute they finished the first
>>> production run because the annotations are so fluid. Now they can
>>> simply take the original 32K probesets and slice-n-dice them at will
>>> to make things that  match up with the genome as we know it now.
>>>
>>> But back to the point at hand. The problem with the hugene10stv1cdf
>>> is it is based on the _unsupported_ cdf file that Affy makes
>>> available. We make it available as well, for those who insist on
>>> using the makecdfenv/affy pipeline, rather than the
>>> pdInfoBuilder/oligo pipeline, which is what one should arguably be
>>> using. Given that the data being used to create the cdf package is
>>> specifically unsupported, caveat emptor.
>>>
>>> I note that the supported library files do contain an 'r4' in the
>>> file name, so assume without any backing data that this library would
>>> actually hew more closely to the annotation data they supply.
>>>
>>> Best,
>>>
>>> Jim
>>>
>>>
>>>        
>>>> The obligatory sessionInfo() is:
>>>>
>>>>          
>>>>> sessionInfo()
>>>>>            
>>>> R version 2.11.0 Patched (2010-04-24 r51813)
>>>> i686-pc-linux-gnu
>>>>
>>>> locale:
>>>>   [1] LC_CTYPE=en_GB.utf8       LC_NUMERIC=C
>>>>   [3] LC_TIME=en_GB.utf8        LC_COLLATE=en_GB.utf8
>>>>   [5] LC_MONETARY=C             LC_MESSAGES=en_GB.utf8
>>>>   [7] LC_PAPER=en_GB.utf8       LC_NAME=C
>>>>   [9] LC_ADDRESS=C              LC_TELEPHONE=C
>>>> [11] LC_MEASUREMENT=en_GB.utf8 LC_IDENTIFICATION=C
>>>>
>>>> attached base packages:
>>>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>>>
>>>> other attached packages:
>>>>   [1] oligo_1.12.0                AffyCompatible_1.8.0
>>>>   [3] RCurl_1.4-1                 bitops_1.0-4.1
>>>>   [5] XML_2.8-1                   oligoClasses_1.10.0
>>>>   [7] limma_3.4.0                 hugene10stv1cdf_2.6.0
>>>>   [9] hugene10stprobeset.db_5.0.1 org.Hs.eg.db_2.4.1
>>>> [11] RSQLite_0.8-4               DBI_0.2-5
>>>> [13] AnnotationDbi_1.10.0        affxparser_1.20.0
>>>> [15] affy_1.26.0                 Biobase_2.8.0
>>>>
>>>> loaded via a namespace (and not attached):
>>>> [1] affyio_1.16.0         Biostrings_2.16.0     IRanges_1.6.0
>>>> [4] preprocessCore_1.10.0 splines_2.11.0        tcltk_2.11.0
>>>> [7] tools_2.11.0
>>>>          
>>>>>            
>>>> Best,
>>>>
>>>>
>>>> Laurent
>>>>
>>>> _______________________________________________
>>>> Bioconductor mailing list
>>>> Bioconductor at stat.math.ethz.ch
>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>> Search the archives:
>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>          
>>>        
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>>      
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>



More information about the Bioconductor mailing list