[BioC] problem illumina annotation with lumi

Thu Dec 10 16:04:44 CET 2009

Dear List,

I am trying to annotate some illumina microarray probes (humanHT12v3) from an experiment
of 96 samples. Apparently there are some difference between annotating with
illuminaHumanv3BeadID.db and lumiHumanAll.db.

Here is what I have done in brief

1. I have read and processed(also includes detection p.value filtering) the raw data file with lumi package
2. Found some differentially expressed genes using linear model

Now in my topTable I have some thing like that

>  top<- topTable(aneu348_fit2,coef=2,adjust="BH")
>  top

            ID      logFC  AveExpr         t      P.Value  adj.P.Val        B
19287  730612  0.1968519 6.506182  5.446788 1.729911e-06 0.03507526 4.750244
19897 3520463  0.3286017 7.057259  5.390423 2.103278e-06 0.03507526 4.580566
3028  2650605  0.4613558 7.115252  5.309757 2.780147e-06 0.03507526 4.338214
3956  3310538  0.5527499 8.000359  5.113185 5.466881e-06 0.05172900 3.750403
1626  3390605 -0.2277937 6.930935 -4.890353 1.168046e-05 0.07558592 3.089894
25875 6280470  0.5706626 7.235376  4.841339 1.378711e-05 0.07558592 2.945587
34978 6760546  0.3195073 7.659098  4.783197 1.677400e-05 0.07558592 2.774918
32380 3940692 -0.2995773 8.258397 -4.756620 1.834288e-05 0.07558592 2.697098
35264 1740020 -0.3454641 7.384281 -4.734429 1.976252e-05 0.07558592 2.632216
33126 6040398  0.5112817 7.517186  4.731312 1.997039e-05 0.07558592 2.623109

Then, I try to annotate top IDs with geneName, geneSymbol , EntrezId and others.

** As you can see from the result of the topTable my probeIDs are the
array_Address_ID (according to manifest file buy illumina HumanHT-12_v3_0_R2_11283641_A)

>   geneSymbol<- getSYMBOL(, 'illuminaHumanv3BeadID.db')
>   geneName<- sapply(lookUp(aneu348_probeList, 'illuminaHumanv3BeadID.db', 'GENENAME'), function(x) x[1])

  gives me the correct geneName and Symbol. (according to the manifest file)

  But when I try to convert these probeIDs using IlluminaID2nuID() or probeID2nuID() method
  it transforms to a complete different set of geneNames and symbol.

I then added "000" before all of my probes and passed it to IllumimnaID2nuID() function

>  top<- paste("000",top,sep="")
>  illu<- IlluminaID2nuID(top)

Warning messages:
1: In getChipInfo(IlluminaID, lib.mapping = lib.mapping, species = species,  :
   Some input IDs can not be matched!

2: In if (!is.na(chipInfo$IDType)) { :
   the condition has length>  1 and only the first element will be used

>  illu[1,]         # Here illu[1,] holds the mapping for "000730612"

Search_Key        ILMN_Gene        Accession           Symbol
               NA               NA               NA               NA
         Probe_Id Array_Address_Id             nuID
               NA               NA               NA

now for some reason it is always showing "NA" for few of the probes even though when I passed
them individually to the function it returns the correct mapping

>  IlluminaID2nuID(top[1])     # here  top[1] = "000730612"

  Search_Key   ILMN_Gene Accession     Symbol  Probe_Id
000730612 "ILMN_10981" "HTRA1"   "NM_002775.3" "HTRA1" "ILMN_1676563"
           Array_Address_Id nuID
000730612 "000730612"      "ZEObIyCCVRqJSjqHrY"

So my questions are :

1. Why the above functions can not find any entry for few probeIDs even though it's present ?
2. The way around I found out (adding "000" in the beginning) , is it correct or there are
    some other better options ?

3. Even though it's a Human-HT12 chip , the getChipInfo() gives

    getChipInfo(aneu348_N)
    $chipVersion
    [1] "HumanWG6_V2_11223189_B"

4. I am trying to develop a workflow which will handle data with both type of probeID
    pattern(ie. "ILMN_1805" or "730612"). What would be the standard path way to annotate
    both type of data?

Please accept my apology if the mail seems long. I tried to provide as mush details as I could

thanks in advance,

regards,
Mamun