[BioC] package pair "hugene10stv1cdf"/"hugene10stprobeset.db"

Laurent Gautier laurent at cbs.dtu.dk
Mon May 3 18:28:26 CEST 2010


Hi James,

Thanks for the clarifications. I am happy to see that Affymetrix has 
picked up the concept of alternative CDF definitions and makes it easier 
for its users.

Regarding bioconductor, wouldn't it make sense to either mark packages 
as "unsupported", or better take them to a different location, making 
their download by the unaware less likely. In the present case should 
the CDF be placed outside of the main repository ?

In addition, wouldn't it make sense to coordinate the release the 
release of probe/probeset mapping structures and annotation files (I am 
reading below that there annotation for revision 5 while the mapping is 
for revision 4) ?
What about making the revision number a documented _non-exported_ vector 
in the packages ?
This way one could do for example:
 > hugene10stprobeset:::revision
[1] "r5"
(keeping the vector non-exported circumvents the issue of a scope 
pollution whenever different packages with a variable "revision" are in 
the search path).

Best,


Laurent



On 03/05/10 17:05, James W. MacDonald wrote:
> Hi Laurent,
>
> Laurent Gautier wrote:
>> Dear List,
>>
>> I am noting potential issues in the package pair  
>> "hugene10stv1cdf"/"hugene10stprobeset.db", as the respective sets of 
>> probe set IDs are not overlapping:
>>
>> > library(hugene10stv1cdf)
>> > library(hugene10stprobeset.db)
>> > summary(ls(hugene10stv1cdf) %in% Lkeys(hugene10stprobesetSYMBOL))
>>    Mode   FALSE    TRUE    NA's
>> logical   28026    4295       0
>> > summary(Lkeys(hugene10stprobesetSYMBOL) %in% ls(hugene10stv1cdf))
>>    Mode   FALSE    TRUE    NA's
>> logical  252727    4295       0
>>
>> Reading closely, one can observe that "hugene10stprobeset.db" refers 
>> to a "revision 5" while the "v1" in "hugene10stv1cdf" suggests a 
>> revision 1. It is unclear to me whether this is linked to the 
>> problem, but if so then there is no hugene10stv5cdf, neither 
>> annotation for v1.
>
> It's hard to say what the 'revision 5' refers to. There is only one 
> HuGene chip, and it is the version 1. There _have_ been nine versions 
> of the annotation file released by Affy (Releases 22-30), so there is 
> no telling what 'revision 5' refers to. But certainly it doesn't refer 
> to a HuGene-1_0-st-v5 chip, as no such thing exists.
>
> I have a personal thesis that the Exon and Gene chips contain all 
> manner of extra sequences that Affy threw on there so they wouldn't 
> have the same problem they had with their 3'-biased chips. Namely that 
> the chips were out-of-date the minute they finished the first 
> production run because the annotations are so fluid. Now they can 
> simply take the original 32K probesets and slice-n-dice them at will 
> to make things that  match up with the genome as we know it now.
>
> But back to the point at hand. The problem with the hugene10stv1cdf is 
> it is based on the _unsupported_ cdf file that Affy makes available. 
> We make it available as well, for those who insist on using the 
> makecdfenv/affy pipeline, rather than the pdInfoBuilder/oligo 
> pipeline, which is what one should arguably be using. Given that the 
> data being used to create the cdf package is specifically unsupported, 
> caveat emptor.
>
> I note that the supported library files do contain an 'r4' in the file 
> name, so assume without any backing data that this library would 
> actually hew more closely to the annotation data they supply.
>
> Best,
>
> Jim
>
>
>>
>> The obligatory sessionInfo() is:
>>
>> > sessionInfo()
>> R version 2.11.0 Patched (2010-04-24 r51813)
>> i686-pc-linux-gnu
>>
>> locale:
>>  [1] LC_CTYPE=en_GB.utf8       LC_NUMERIC=C
>>  [3] LC_TIME=en_GB.utf8        LC_COLLATE=en_GB.utf8
>>  [5] LC_MONETARY=C             LC_MESSAGES=en_GB.utf8
>>  [7] LC_PAPER=en_GB.utf8       LC_NAME=C
>>  [9] LC_ADDRESS=C              LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_GB.utf8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>
>> other attached packages:
>>  [1] oligo_1.12.0                AffyCompatible_1.8.0
>>  [3] RCurl_1.4-1                 bitops_1.0-4.1
>>  [5] XML_2.8-1                   oligoClasses_1.10.0
>>  [7] limma_3.4.0                 hugene10stv1cdf_2.6.0
>>  [9] hugene10stprobeset.db_5.0.1 org.Hs.eg.db_2.4.1
>> [11] RSQLite_0.8-4               DBI_0.2-5
>> [13] AnnotationDbi_1.10.0        affxparser_1.20.0
>> [15] affy_1.26.0                 Biobase_2.8.0
>>
>> loaded via a namespace (and not attached):
>> [1] affyio_1.16.0         Biostrings_2.16.0     IRanges_1.6.0
>> [4] preprocessCore_1.10.0 splines_2.11.0        tcltk_2.11.0
>> [7] tools_2.11.0
>> >
>>
>> Best,
>>
>>
>> Laurent
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: 
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>



More information about the Bioconductor mailing list