[BioC] Illumina annotation packages discrepancy

Tue Dec 2 08:42:40 CET 2008

Oups... I'm really sorry Mark for the confusion. I think misread the 
vignette.

I BLASTed some of the missing probes and some of them gave quite 
convincing results (100% identity but with different variants), others 
didn't return any sequence. So I'll try with the package from 2.2.

Thanks again,
Renaud

Lynn Amon wrote:
> The illuminaHumanv2.db package is not a "proprietary" package.  It is 
> currently maintained by Mark Dunning (Mark.Dunning at cancer.org.uk).  It 
> is based on BLASTed sequences but there was a problem in creating the 
> package when more than one accession was assigned to a probe which 
> caused the annotation program to skip all those probes which is why 
> you are finding so many without annotation.  You should contact Mark 
> to find out if that problem was corrected and a new version released.  
> You could also try using 2.2 release which I created and has 
> annotation for all those probes.
> Lynn
>
>
> Renaud Gaujoux wrote:
>> Hi Pan,
>>
>> thanks for your answer. I've been (and still am) struggling a bit to 
>> get consistent and up to date annotation for my data.
>>
>> So, I guess it is more reliable to use the lumiHumanAll.db package?
>>
>> However, what about the probes that are note annotated in 
>> lumiHumanAll but look like interesting for my study (i.e. appearing 
>> in my top lists for differential expression or classification power).
>> I've got such probes that are annotated neither packages 
>> lumiHumanAll.db nor in lumiHumanV2 but are in illuminaHumanv2.
>>
>> Hence no package give me consistent annotation for my top genes. 
>> However I've got an annotation file (that came with the array data, I 
>> guess output by BeadStudio) that gives me annotations for all of my 
>> probes. But as you mentioned, these might be outdated, which actually 
>> bothers me. Any suggestion about that?
>>
>> By the way, how come that even Illumina "proprietary" packages 
>> (illuminaHumanv2.db) don't annotate correctly their own probes? :(
>>
>> Thanks again for your help and clarification, and the lumi package.
>>
>> Renaud
>>
>>
>> Pan Du wrote:
>>> Hi Renaud,
>>>
>>> The reason of discrepancy is due to the different mapping criteria. 
>>> Both
>>> "lumiHumanAll.db" and "illuminaHumanv2.db" libraries are based on 
>>> Blasting
>>> result of RefSeq database. The "lumiHumanAll.db" library is nuID 
>>> indexed and
>>> includes all the probes of different versions. For the mapping from 
>>> probe to
>>> RefSeq, it defined both sensitivity and specificity (see the vignette
>>> "IlluminaAnnotation.Rnw" in the lumi package). As a result, it might 
>>> include
>>> less mapping than "illuminaHumanv2.db" because "lumiHumanAll.db" 
>>> filtered
>>> out some dubious mappings (e.g., one probe has multiple perfect 
>>> mapping.)
>>>
>>> The "lumiHumanV2" library was built based on the original annotation by
>>> Illumina company. As a result, it has much more probe mappings. 
>>> However,
>>> many mappings might be outdated because of the updates of the genome
>>> annotation.
>>>
>>> Hope this will clarify the confusion.
>>>
>>>
>>> Pan
>>>
>>>
>>> On 11/28/08 5:00 AM, "bioconductor-request at stat.math.ethz.ch"
>>> <bioconductor-request at stat.math.ethz.ch> wrote:
>>>
>>>  
>>>> Date: Thu, 27 Nov 2008 16:03:36 +0200
>>>> From: Renaud Gaujoux <renaud at mancala.cbio.uct.ac.za>
>>>> Subject: [BioC] Illumina annotation packages discrepancy
>>>> To: bioconductor at stat.math.ethz.ch
>>>> Message-ID: <492EA8B8.5000400 at cbio.uct.ac.za>
>>>> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>>>>
>>>> Hi list,
>>>>
>>>> I've got BeadSummary data from Illumina (Array content:
>>>> HUMANREF-8_V2_11223162_B.XML.xml).
>>>> I imported it in R using the function lumi.batch.
>>>> This automatically computed the nuID for each probe and set the
>>>> annotation package to lumiHumanAll.db.
>>>> This is all good.
>>>>
>>>> BUT, when I do
>>>>
>>>> lookUp(nuIDs, 'lumiHumanAll.db', 'GENENAME')
>>>>
>>>> I get 2921out of 20589 probes with NA.
>>>>
>>>> If I do the same using the old annotation package lumiHumanV2:
>>>>
>>>> lookUp(nuIDs, 'lumiHumanV2', 'GENENAME')
>>>>
>>>> I get 454 out of 20589 probes with NA.
>>>>
>>>> Finally, if I do the same using the annotation package
>>>> illuminaHumanv2.db (but based on the corresponding TargetIDs):
>>>>
>>>> lookUp(targetIDs, 'illuminaHumanv2.db', 'GENENAME')
>>>>
>>>> I get 2041out of 20589 probes with NA.
>>>>
>>>> Can anybody give me an explanation for that discrepancy? And what
>>>> annotation package I should use as it looks like some interesting 
>>>> probes
>>>> (for my experiment) don't have annotation in the new version?
>>>>
>>>> Also I could not find any reference to that HUMANREF-8_V2_11223162_B
>>>> annotation (neither on Illumina website nor in Bioconductor 
>>>> packages). I
>>>> only found information about HUMANREF-8_V2_11223162_A. Is the letter
>>>> suffix (A or B) really important?
>>>>
>>>> Thanks
>>>>
>>>>
>>>>     
>>>
>>>
>>> ------------------------------------------------------
>>> Pan Du, PhD
>>> Research Assistant Professor
>>> Northwestern University Biomedical Informatics Center
>>> 750 N. Lake Shore Drive, 11-176
>>> Chicago, IL  60611
>>> Office (312) 503-2360; Fax: (312) 503-5388
>>> dupan (at) northwestern.edu
>>> ------------------------------------------------------
>>>  
>>>
>>>
>>>
>>>
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: 
>> http://news.gmane.org/gmane.science.biology.informatics.conductor