[BioC] lumi annotation using nuID, missing gene symbols

Marc Carlson mcarlson at fhcrc.org
Fri May 7 19:45:12 CEST 2010


Hi John,

I don't have enough of your code to know for sure, but you might not be
doing anything wrong.  The lumi package basically allows you to map the
individual probes from Illumina onto the appropriate (current) refseq or
entrez gene IDs which in turn allows them to be connected to the
appropriate gene symbol.  But in doing so, you are no longer blindly
trusting the mappings from Illumina and so now your mappings will be
more cautious/conservative than the output you got directly from
Illumina.  Being more careful can mean finding fewer matches as not all
probes may be measuring what they were initially designed to measure. 
You can read more about the details of this by looking over the
vignettes for the lumi package here  (section 3.2 of the 1st vignette):

http://www.bioconductor.org/packages/release/bioc/html/lumi.html


How best to match probes or sets of probes onto annotations is a pretty
huge topic.  But if you should feel that the lumi package is being too
conservative in it's assignment, you can always make your own using
SQLForge from the AnnotationDbi package to match up the probes or groups
of probes with whatever refseq/genbank/entrez IDs you are willing to
trust with your data.  The instructions for using SQLForge are here in
case you want to do that:

http://www.bioconductor.org/packages/release/bioc/html/AnnotationDbi.html

I hope you find this helpful,


  Marc




On 05/07/2010 06:20 AM, John Coulthard wrote:
> Dear List
>
> I'm analyzing some (HumanHT12_V3_0_R1_11283641_A) Illumina data using the lumi package.  The raw data has 48803 probes, 36157 of which have a gene symbol annotation and 12646 don't.  When I use lumi, and convert probe ids to nuIDs, then annotate the nuIDs I only get 25935 probes annotated with a gene symbol.  
>
> There can't be that many probes which have had annotated gene symbol deleted, so what am I doing wrong?
> Is there a way to get the probe_ids and gene symbols that came with the raw data onto my TopTable post analysis?
>
> My working below (not the full analysis just an example of how I did the annotation bit).
>
> Thanks for you time.
>
> John
>
>
>
>
>   
>> lumidata<-lumiR("Sample Probe Profile_rawdata.txt", lib.mapping='lumiHumanIDMapping')
>>     
> Perform Quality Control assessment of the LumiBatch object ...
> Duplicated IDs found and were merged!
>   
>> f <- exprs(lumidata)
>> g<-as.matrix(rownames(f))
>> f<-as.data.frame(cbind(f,g) )
>> head(f)
>>     
>                           1        2        3        4           V25
> Ku8QhfS0n_hIOABXuE  92 84 75 79   Ku8QhfS0n_hIOABXuE
> fqPEquJRRlSVSfL.8A 113  120 111 109  fqPEquJRRlSVSfL.8A
> ckiehnugOno9d7vf1Q 107 104 94 94  ckiehnugOno9d7vf1Q 
> x57Vw5B5Fbt5JUnQkI 93 83 94 94   x57Vw5B5Fbt5JUnQkI
> ritxUH.kuHlYqjozpE 93 97 77 89  ritxUH.kuHlYqjozpE
> QpE5UiUgmJOJEkPXpc 102 95 97  92  QpE5UiUgmJOJEkPXpc
>
>   
>> f$Symbol<-if (require(lumiHumanAll.db)) getSYMBOL(f$V25, 'lumiHumanAll.db')
>> sum(is.na(f$Symbol))
>>     
> [1] 22868
>
>
>   
>> data<-read.csv("Sample Probe Profile_rawdata.txt", header = TRUE, sep="\t")
>> names(data)
>>     
>   [1] "PROBE_ID"           "SYMBOL"             "X1.AVG_Signal"      
> "X1.Detection.Pval"  "X1.NARRAYS"         "X1.ARRAY_STDEV"     "X1.BEAD_STDERR"    
>
>     ...
>
>   
>> sum(is.na(data$SYMBOL))
>>     
> [1] 0
>   
>> sum(data$SYMBOL=="")
>>     
> [1] 12646
>   
>> sum(data$SYMBOL!="")
>>     
> [1] 36157
>
>
>
>
>   
>> sessionInfo()
>>     
> R version 2.10.1 (2009-12-14) 
> i386-redhat-linux-gnu 
>
> locale:
>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=C             
>  [6] LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base     
>
> other attached packages:
>  [1] beadarray_1.14.0         lumiHumanIDMapping_1.4.0 limma_3.2.3              lumi_1.12.4              MASS_7.3-4               preprocessCore_1.8.0    
>  [7] mgcv_1.6-1               affy_1.24.2              lumiHumanAll.db_1.8.1    org.Hs.eg.db_2.3.6       RSQLite_0.8-4            DBI_0.2-5               
> [13] annotate_1.24.1          AnnotationDbi_1.8.2      Biobase_2.6.1           
>
> loaded via a namespace (and not attached):
>  [1] affyio_1.14.0      grid_2.10.1        hwriter_1.2        KernSmooth_2.23-3  lattice_0.17-26    Matrix_0.999375-33 nlme_3.1-96       
>  [8] tcltk_2.10.1       tools_2.10.1       xtable_1.5-6      
>   
>>     
>  		 	   		  
> _________________________________________________________________
> Hotmail: Free, trusted and rich email service.
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>



More information about the Bioconductor mailing list