[BioC] IlluminaHumanMethylation450k.db Reference Versions

Fri Nov 11 01:58:51 CET 2011

The behavior of all of the mappings (with exceptions for those ones that 
Tim has has previously "adjusted" such as CHR), will "hide" the probes 
that match multiple entrez gene IDs.  This happens because the 450k.db 
was made as a chip package, and chip packages were specifically designed 
to hide probes that behave that way by default.  The reason for this 
behavior is because chip packages were originally designed to work 
primarily as mRNA microarrray platforms.  So the default behavior is not 
really broken, or even really inappropriate.  It's just that this is an 
atypical use case.  But the data is all in there, and you absolutely CAN 
get to it with really very little trouble.  You just have to use the 
toggleProbes method to expose it.

You can use it like this:

## step 1: create a mapping that exposes ALL the probes regardless of 
how many genes the match:
fullAliasMapping <- 
toggleProbes(IlluminaHumanMethylation450kALIAS2PROBE, "all")

## step 2: use that mapping instead of 
IlluminaHumanMethylation450kALIAS2PROBE
head(toTable(fullAliasMapping))

## You can compare the two mappings to see how they behave differently:
dim(toTable(IlluminaHumanMethylation450kALIAS2PROBE))
dim(toTable(fullAliasMapping))

I understand that Tim is planning to modify this package so that it's 
default behavior is more in line with what users of this platform 
expect, which is a terrific thing for him to do.  But in the meantime, 
the package is perfectly serviceable, you just have to know how to use 
the toggleProbes method.

   Marc

On 11/07/2011 07:09 AM, Tim Triche, Jr. wrote:
> CHR does what is expected of the mapping in that it returns the chromosome
> of the probe.   It is constructed by overwriting the bimap for CHR with
> that for CHR37 on export.  Without this kludge, tens of thousands of probes
> return NA as their chromosome, which is clearly incorrect.
>
> As it happens, due to a long-standing tradition of excluding 'promiscuous'
> probes, the default behavior of ALIAS2PROBE (for example) is also wrong.
>   I'm about to upload 2.0.6 with that patched.
>
> The problem with gene-centric annotations of the sort used in Bioconductor
> .db packages is that they're gene-centric; the mapping from probes to
> genes, locations, chromosomes, GO annotations, KEGG pathways, and the like
> is done through EntrezGene IDs. There has been some discussion as to
> whether completely reannotating the chip might not be a better idea in this
> respect, i.e. mapping the probes to the nearest TSS.  As I have gained more
> experience with the GRanges architecture, I have realized that GRanges are
> the more sensible approach to annotating the probes on the 450k.
>
> Nonetheless, the 450k.db package is out there so it ought to do what it's
> expected to, unless or until everything transitions to the manifest package
> that Kasper and Martin Aryee put together.
>
>
> On Sun, Nov 6, 2011 at 11:00 PM, Dario Strbenac<D.Strbenac at garvan.org.au>wrote:
>
>> In the package IlluminaHumanMethylation450k.db, there are three data
>> objects relating probes to chromosomes. They are
>> IlluminaHumanMethylation450kCHR, IlluminaHumanMethylation450kCHR36, and
>> IlluminaHumanMethylation450kCHR37. I wonder what the reason of having
>> IlluminaHumanMethylation450kCHR is, and what reference was used, since that
>> is not explained in the help page of IlluminaHumanMethylation450kCHR ? Is
>> it redundant ?
>>
>> Also, the mapping to locations, IlluminaHumanMethylation450kCHRLOC, is
>> only available for hg19. There should also be one for hg18, or otherwise
>> the IlluminaHumanMethylation450kCHR36 should not be supported.
>>
>> I am referring to version 1.4.6 of the IlluminaHumanMethylation450k.db
>> package.
>>
>> --------------------------------------
>> Dario Strbenac
>> Research Assistant
>> Cancer Epigenetics
>> Garvan Institute of Medical Research
>> Darlinghurst NSW 2010
>> Australia
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>
>