[BioC] lumi annotation using nuID, missing gene symbols

Gilbert Feng g-feng at northwestern.edu
Fri May 7 19:05:46 CEST 2010


Hi, John

Thanks for choosing lumi.

First of all, we recommend to use the latest Bioconductor release(2.6) and
latest lumiHumanAll.db (1.10.0), which supports one nuID maps to more genes.

Not all of annotated probes can be well aligned to the corresponding genes.
Therefore, only probes with good alignment scores are annotated in
lumiHumanAll.db 

The annotation mapping is based on annotated files from computational
biology group in University of Cambridge (
http://www.compbio.group.cam.ac.uk/Resources/Annotation/). In each
annotation file, only probes with perfect and good alignment scores are kept
in lumiHumanAll.db, lumiMouseAll.db and lumiRatAll.db . You can use those
annotation files to do mapping by yourself.

Hope this is helpful for your question!

Gilbert 


On 5/7/10 8:20 AM, "John Coulthard" <bahhab at hotmail.com> wrote:

> 
> Dear List
> 
> I'm analyzing some (HumanHT12_V3_0_R1_11283641_A) Illumina data using the lumi
> package.  The raw data has 48803 probes, 36157 of which have a gene symbol
> annotation and 12646 don't.  When I use lumi, and convert probe ids to nuIDs,
> then annotate the nuIDs I only get 25935 probes annotated with a gene symbol.
> 
> There can't be that many probes which have had annotated gene symbol deleted,
> so what am I doing wrong?
> Is there a way to get the probe_ids and gene symbols that came with the raw
> data onto my TopTable post analysis?
> 
> My working below (not the full analysis just an example of how I did the
> annotation bit).
> 
> Thanks for you time.
> 
> John
> 
> 
> 
> 
>> lumidata<-lumiR("Sample Probe Profile_rawdata.txt",
>> lib.mapping='lumiHumanIDMapping')
> Perform Quality Control assessment of the LumiBatch object ...
> Duplicated IDs found and were merged!
>> f <- exprs(lumidata)
>> g<-as.matrix(rownames(f))
>> f<-as.data.frame(cbind(f,g) )
>> head(f)
>                           1        2        3        4           V25
> Ku8QhfS0n_hIOABXuE  92 84 75 79   Ku8QhfS0n_hIOABXuE
> fqPEquJRRlSVSfL.8A 113  120 111 109  fqPEquJRRlSVSfL.8A
> ckiehnugOno9d7vf1Q 107 104 94 94  ckiehnugOno9d7vf1Q
> x57Vw5B5Fbt5JUnQkI 93 83 94 94   x57Vw5B5Fbt5JUnQkI
> ritxUH.kuHlYqjozpE 93 97 77 89  ritxUH.kuHlYqjozpE
> QpE5UiUgmJOJEkPXpc 102 95 97  92  QpE5UiUgmJOJEkPXpc
> 
>> f$Symbol<-if (require(lumiHumanAll.db)) getSYMBOL(f$V25, 'lumiHumanAll.db')
>> sum(is.na(f$Symbol))
> [1] 22868
> 
> 
>> data<-read.csv("Sample Probe Profile_rawdata.txt", header = TRUE, sep="\t")
>> names(data)
>   [1] "PROBE_ID"           "SYMBOL"             "X1.AVG_Signal"
> "X1.Detection.Pval"  "X1.NARRAYS"         "X1.ARRAY_STDEV"
> "X1.BEAD_STDERR" 
> 
>     ...
> 
>> sum(is.na(data$SYMBOL))
> [1] 0
>> sum(data$SYMBOL=="")
> [1] 12646
>> sum(data$SYMBOL!="")
> [1] 36157
> 
> 
> 
> 
>> sessionInfo()
> R version 2.10.1 (2009-12-14)
> i386-redhat-linux-gnu
> 
> locale:
>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8
> LC_COLLATE=en_US.UTF-8     LC_MONETARY=C
>  [6] LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C
> LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
> 
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
> 
> other attached packages:
>  [1] beadarray_1.14.0         lumiHumanIDMapping_1.4.0 limma_3.2.3
> lumi_1.12.4              MASS_7.3-4               preprocessCore_1.8.0
>  [7] mgcv_1.6-1               affy_1.24.2              lumiHumanAll.db_1.8.1
> org.Hs.eg.db_2.3.6       RSQLite_0.8-4            DBI_0.2-5
> [13] annotate_1.24.1          AnnotationDbi_1.8.2      Biobase_2.6.1
> 
> loaded via a namespace (and not attached):
>  [1] affyio_1.14.0      grid_2.10.1        hwriter_1.2
> KernSmooth_2.23-3  lattice_0.17-26    Matrix_0.999375-33 nlme_3.1-96
>  [8] tcltk_2.10.1       tools_2.10.1       xtable_1.5-6
>> 
> 
>  
> _________________________________________________________________
> Hotmail: Free, trusted and rich email service.
> 
> [[alternative HTML version deleted]]
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor

-----------------------------------------------
Gang (Gilbert) Feng, PhD
Biomedical Informatics Center
Robert H. Lurie Comprehensive Cancer Center
Northwestern University
750 N. Lake Shore Drive, 11th Floor(11-175e)
Chicago, IL  60611
Phone:312-503-2358
Email g-feng (at) northwestern.edu



More information about the Bioconductor mailing list