[BioC] Illumina annotation packages discrepancy

Renaud Gaujoux renaud at mancala.cbio.uct.ac.za
Tue Dec 2 09:21:26 CET 2008


I just had a quick try but just got NAs. Should the code below work with 
this package?

entrez <- getEG(probeids, 'illuminaHumanv2ProbeID.db')

which wraps:

unlist(lookUp(probeids, 'illuminaHumanv2ProbeID.db', "ENTREZID"))

I tried with probeids being Illumina full IDs, Illumina trimmed IDs 
(without ILMN_), and with nuIDs.

Thanks,
Renaud

Lynn Amon wrote:
> You'll want to use the illuminaHumanv2ProbeID.db package.
> Lynn
>
> Renaud Gaujoux wrote:
>> Oups... I'm really sorry Mark for the confusion. I think misread the 
>> vignette.
>>
>> I BLASTed some of the missing probes and some of them gave quite 
>> convincing results (100% identity but with different variants), 
>> others didn't return any sequence. So I'll try with the package from 
>> 2.2.
>>
>> Thanks again,
>> Renaud
>>
>> Lynn Amon wrote:
>>> The illuminaHumanv2.db package is not a "proprietary" package.  It 
>>> is currently maintained by Mark Dunning 
>>> (Mark.Dunning at cancer.org.uk).  It is based on BLASTed sequences but 
>>> there was a problem in creating the package when more than one 
>>> accession was assigned to a probe which caused the annotation 
>>> program to skip all those probes which is why you are finding so 
>>> many without annotation.  You should contact Mark to find out if 
>>> that problem was corrected and a new version released.  You could 
>>> also try using 2.2 release which I created and has annotation for 
>>> all those probes.
>>> Lynn
>>>
>>>
>>> Renaud Gaujoux wrote:
>>>> Hi Pan,
>>>>
>>>> thanks for your answer. I've been (and still am) struggling a bit 
>>>> to get consistent and up to date annotation for my data.
>>>>
>>>> So, I guess it is more reliable to use the lumiHumanAll.db package?
>>>>
>>>> However, what about the probes that are note annotated in 
>>>> lumiHumanAll but look like interesting for my study (i.e. appearing 
>>>> in my top lists for differential expression or classification power).
>>>> I've got such probes that are annotated neither packages 
>>>> lumiHumanAll.db nor in lumiHumanV2 but are in illuminaHumanv2.
>>>>
>>>> Hence no package give me consistent annotation for my top genes. 
>>>> However I've got an annotation file (that came with the array data, 
>>>> I guess output by BeadStudio) that gives me annotations for all of 
>>>> my probes. But as you mentioned, these might be outdated, which 
>>>> actually bothers me. Any suggestion about that?
>>>>
>>>> By the way, how come that even Illumina "proprietary" packages 
>>>> (illuminaHumanv2.db) don't annotate correctly their own probes? :(
>>>>
>>>> Thanks again for your help and clarification, and the lumi package.
>>>>
>>>> Renaud
>>>>
>>>>
>>>> Pan Du wrote:
>>>>> Hi Renaud,
>>>>>
>>>>> The reason of discrepancy is due to the different mapping 
>>>>> criteria. Both
>>>>> "lumiHumanAll.db" and "illuminaHumanv2.db" libraries are based on 
>>>>> Blasting
>>>>> result of RefSeq database. The "lumiHumanAll.db" library is nuID 
>>>>> indexed and
>>>>> includes all the probes of different versions. For the mapping 
>>>>> from probe to
>>>>> RefSeq, it defined both sensitivity and specificity (see the vignette
>>>>> "IlluminaAnnotation.Rnw" in the lumi package). As a result, it 
>>>>> might include
>>>>> less mapping than "illuminaHumanv2.db" because "lumiHumanAll.db" 
>>>>> filtered
>>>>> out some dubious mappings (e.g., one probe has multiple perfect 
>>>>> mapping.)
>>>>>
>>>>> The "lumiHumanV2" library was built based on the original 
>>>>> annotation by
>>>>> Illumina company. As a result, it has much more probe mappings. 
>>>>> However,
>>>>> many mappings might be outdated because of the updates of the genome
>>>>> annotation.
>>>>>
>>>>> Hope this will clarify the confusion.
>>>>>
>>>>>
>>>>> Pan
>>>>>
>>>>>
>>>>> On 11/28/08 5:00 AM, "bioconductor-request at stat.math.ethz.ch"
>>>>> <bioconductor-request at stat.math.ethz.ch> wrote:
>>>>>
>>>>>  
>>>>>> Date: Thu, 27 Nov 2008 16:03:36 +0200
>>>>>> From: Renaud Gaujoux <renaud at mancala.cbio.uct.ac.za>
>>>>>> Subject: [BioC] Illumina annotation packages discrepancy
>>>>>> To: bioconductor at stat.math.ethz.ch
>>>>>> Message-ID: <492EA8B8.5000400 at cbio.uct.ac.za>
>>>>>> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>>>>>>
>>>>>> Hi list,
>>>>>>
>>>>>> I've got BeadSummary data from Illumina (Array content:
>>>>>> HUMANREF-8_V2_11223162_B.XML.xml).
>>>>>> I imported it in R using the function lumi.batch.
>>>>>> This automatically computed the nuID for each probe and set the
>>>>>> annotation package to lumiHumanAll.db.
>>>>>> This is all good.
>>>>>>
>>>>>> BUT, when I do
>>>>>>
>>>>>> lookUp(nuIDs, 'lumiHumanAll.db', 'GENENAME')
>>>>>>
>>>>>> I get 2921out of 20589 probes with NA.
>>>>>>
>>>>>> If I do the same using the old annotation package lumiHumanV2:
>>>>>>
>>>>>> lookUp(nuIDs, 'lumiHumanV2', 'GENENAME')
>>>>>>
>>>>>> I get 454 out of 20589 probes with NA.
>>>>>>
>>>>>> Finally, if I do the same using the annotation package
>>>>>> illuminaHumanv2.db (but based on the corresponding TargetIDs):
>>>>>>
>>>>>> lookUp(targetIDs, 'illuminaHumanv2.db', 'GENENAME')
>>>>>>
>>>>>> I get 2041out of 20589 probes with NA.
>>>>>>
>>>>>> Can anybody give me an explanation for that discrepancy? And what
>>>>>> annotation package I should use as it looks like some interesting 
>>>>>> probes
>>>>>> (for my experiment) don't have annotation in the new version?
>>>>>>
>>>>>> Also I could not find any reference to that HUMANREF-8_V2_11223162_B
>>>>>> annotation (neither on Illumina website nor in Bioconductor 
>>>>>> packages). I
>>>>>> only found information about HUMANREF-8_V2_11223162_A. Is the letter
>>>>>> suffix (A or B) really important?
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>>
>>>>>>     
>>>>>
>>>>>
>>>>> ------------------------------------------------------
>>>>> Pan Du, PhD
>>>>> Research Assistant Professor
>>>>> Northwestern University Biomedical Informatics Center
>>>>> 750 N. Lake Shore Drive, 11-176
>>>>> Chicago, IL  60611
>>>>> Office (312) 503-2360; Fax: (312) 503-5388
>>>>> dupan (at) northwestern.edu
>>>>> ------------------------------------------------------
>>>>>  
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> Bioconductor mailing list
>>>> Bioconductor at stat.math.ethz.ch
>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>> Search the archives: 
>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>



More information about the Bioconductor mailing list