[BioC] lumiHumanAll.db - wrong information in lumiHumanAllCHR for some probes?

Janet Young jayoung at fhcrc.org
Fri Dec 30 01:14:49 CET 2011


Hi,

I'm working with lumiHumanAll.db and chromosomal locations using the CHR and CHRLOC tables.   

Mostly things turn out fine but I think I have found some probes for which the information in CHR and CHRLOC doesn't match up.  (I'm not sure whether I found all the problem probes, or just those a few that were most obvious because they seemed to be off the end of the chromosome).  

I'd guess something to do with how probes mapping to multiple locations are dealt with, which is tricky, but it seems important to be internally consistent between CHR and CHRLOC.   

I've tried to explain everything with the code at the bottom of the email.

thanks very much,

Janet

------------------------------------------------------------------- 

Dr. Janet Young 

Tapscott and Malik labs

Fred Hutchinson Cancer Research Center
1100 Fairview Avenue N., C3-168, 
P.O. Box 19024, Seattle, WA 98109-1024, USA.

tel: (206) 667 1471 fax: (206) 667 6524
email: jayoung  ...at...  fhcrc.org


------------------------------------------------------------------- 




library(lumiHumanAll.db)
library(lumi)
library(annotate)

### these have mismatched CHR and CHRLOC info - I noticed them among a much larger set of probes
odd_mappers <- c("cS._E8f0CEAHsPH.oU", "3B5Dx.5FBcAstHt9Iw", "Ho.7bwAyQBWQ8f_RQU", "0k9AKLpXv97vAFU.rk")

### and a few other probes that looked fine
good_mappers <- c("Ku8QhfS0n_hIOABXuE", "fqPEquJRRlSVSfL.8A", "ckiehnugOno9d7vf1Q", "x57Vw5B5Fbt5JUnQkI")

probes <- c(odd_mappers,good_mappers)
probeType <- c( rep("odd",length(odd_mappers)), rep("good",length(good_mappers)) ) 

### get their map info from CHR and CHRLOC
chrs <- lookUp(probes, "lumiHumanAll.db", "CHR")
locs <- lookUp(probes, "lumiHumanAll.db", "CHRLOC")

### some probes have two locs, which is OK, but make sure we know which information to double up when we make a table later
numLocsPerProbe <- sapply(locs,length)

#### put that info into a table
mapping <- data.frame( probe=rep( probes, numLocsPerProbe),
    probeType=rep( probeType, numLocsPerProbe),
    chrLoc=abs(unlist(locs,use.names=FALSE)),  #ignore strand
    chrsFromChrsList=rep(unlist(chrs,use.names=FALSE), numLocsPerProbe), 
    chrsFromLocsList=unlist(lapply(locs, names),use.names=FALSE)   )

#### looking at CHRLENGTH was how I realized some of the CHR info wasn't right - probe maps way after end of chromosome
mapping[,"chrLengthChrsList"] <- org.Hs.egCHRLENGTHS[ as.character(mapping[,"chrsFromChrsList"]) ]
mapping[,"chrLengthLocsList"] <- org.Hs.egCHRLENGTHS[ as.character(mapping[,"chrsFromLocsList"]) ]

#### add probe sequences 
mapping[,"seq"] <- id2seq(as.character(mapping[,"probe"]))

#### take a look at the table, and do some BLAT searches at UCSC website to see where the probe really maps
mapping

### BLAT search results - these are the exact matches, but all have other non-exact matches)
# first probe cS._E8f0CEAHsPH.oU maps to chr10:56367644-56367693
# second probe 3B5Dx.5FBcAstHt9Iw maps to chr17:13446846-13446895
# third probe Ho.7bwAyQBWQ8f_RQU maps to chr7:34980375-34980424
# fourth probe 0k9AKLpXv97vAFU.rk maps to chr3:149699708-149699757
####### so in each of those cases it looks like lumiHumanAllCHR has the correct chromosome, and CHRLOC is wrong (perhaps it took one of the secondary, non-exact matches?).  (so the locations on the correct chromosome are not available in any table?)

#################


sessionInfo()

R version 2.14.0 (2011-10-31)
Platform: i386-apple-darwin9.8.0/i386 (32-bit)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] annotate_1.32.1        lumi_2.6.0             nleqslv_1.9.1         
 [4] methylumi_2.0.1        lumiHumanAll.db_1.16.0 org.Hs.eg.db_2.6.4    
 [7] RSQLite_0.11.0         DBI_0.2-5              AnnotationDbi_1.16.10 
[10] Biobase_2.14.0        

loaded via a namespace (and not attached):
 [1] affy_1.32.0           affyio_1.22.0         BiocInstaller_1.2.1  
 [4] grid_2.14.0           hdrcde_2.15           IRanges_1.12.5       
 [7] KernSmooth_2.23-7     lattice_0.20-0        MASS_7.3-16          
[10] Matrix_1.0-2          mgcv_1.7-11           nlme_3.1-102         
[13] preprocessCore_1.16.0 xtable_1.6-0          zlibbioc_1.0.0       



More information about the Bioconductor mailing list