[BioC] IlluminaHumanMethylation450k.db: Missing probes in IlluminaHumanMethylation450kPROBELOCATION function?

James W. MacDonald jmacdon at uw.edu
Mon Jul 1 20:37:54 CEST 2013


Hi Simone,

On 7/1/2013 1:35 PM, Simone wrote:
> Hello!
>
> A little question: I've got a table with beta values obtained by
> Illumina's 450K BeadChip microarray and want to know for each probe
> where in the gene it is located (mainly promoter region or gene body).
> I found the IlluminaHumanMethylation450kPROBELOCATION function in the
> IlluminaHumanMethylation450k.db package which seems to do what I want,
> but not for all the probes.
> In detail, I have got data for 473,029 probes, but the function only
> returns values for 354,770 unique probes, so 118,259 ones are missing.
> Is this expected behaviour? Should I use another package to get
> location information for all the probes contained in my file?

You would be better off using the FDb.InfiniumMethylation.hg19 package, 
which not only has all the locations for the probes, but also has them 
in a more useful format.

 > library(FDb.InfiniumMethylation.hg19)
 > x <- get450k()
Warning message:
In if (is.na(genome(GR))) { :
   the condition has length > 1 and only the first element will be used
 > x
GRanges with 485577 ranges and 7 metadata columns:
              seqnames               ranges strand   | addressA addressB 
channel
<Rle> <IRanges> <Rle>   | <Rle> <Rle> <Rle>
   cg13869341     chr1       [15865, 15866]      *   | 62703328 
16661461     Red
   cg14008030     chr1       [18827, 18828]      *   | 27651330 <NA>    Both
   cg12045430     chr1       [29407, 29408]      *   | 25703424 
34666387     Red
   cg20826792     chr1       [29425, 29426]      *   | 61731400 
14693326     Red
   cg00381604     chr1       [29435, 29436]      *   | 26752380 
50693408     Red

You can do all kinds of cool things with a GRanges object that you 
cannot do with simple location data. But this comes at a cost of 
complexity, so you will need to do some reading. I would recommend at a 
minimum that you read the vignettes for GenomicFeatures, and look at the 
help page for this package (?FDb.InfiniumMethylation.hg19).

And the current build is based on only hg19 (GRCh37), so there is no 
issue of which build you are getting.

Best,

Jim




>
> And secondly: As the IlluminaHumanMethylation450k.db package seems to
> be deprecated, on which build of the genome is the
> IlluminaHumanMethylation450kPROBELOCATION information based? Because I
> see that in the package there are some separate functions for build 36
> respectively 37, but in the help of the function in question I can
> only find the information that mappings are based on data of Illumina
> from January 2011 (which should then be GRCh37, the build I have to
> work with in this case, but however I am not completely sure as it
> doesn't say it clearly).
>
> Best regards,
> Simone
>
>> sessionInfo()
> R version 3.0.1 (2013-05-16)
> Platform: x86_64-pc-linux-gnu (64-bit)
>
> locale:
>   [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
> LC_TIME=en_US.UTF-8
>   [4] LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8
> LC_MESSAGES=en_US.UTF-8
>   [7] LC_PAPER=C                 LC_NAME=C
> LC_ADDRESS=C
> [10] LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8
> LC_IDENTIFICATION=C
>
> attached base packages:
> [1] parallel  stats     graphics  grDevices utils     datasets
> methods   base
>
> other attached packages:
>   [1] sqldf_0.4-6.4                         RSQLite.extfuns_0.0.1
>   [3] chron_2.3-43                          gsubfn_0.6-5
>   [5] proto_0.3-10                          RColorBrewer_1.0-5
>   [7] illuminaHumanv2.db_1.18.0             IlluminaHumanMethylation450k.db_2.0.7
>   [9] IlluminaHumanMethylation27k.db_1.4.7  org.Hs.eg.db_2.9.0
> [11] RSQLite_0.11.4                        DBI_0.2-7
> [13] AnnotationDbi_1.22.6                  affy_1.38.1
> [15] Biobase_2.20.0                        BiocGenerics_0.6.0
>
> loaded via a namespace (and not attached):
> [1] affyio_1.28.0         AnnotationForge_1.2.1 BiocInstaller_1.10.2
> IRanges_1.18.1
> [5] preprocessCore_1.22.0 stats4_3.0.1          tcltk_3.0.1
> tools_3.0.1
> [9] zlibbioc_1.6.0
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099



More information about the Bioconductor mailing list