[BioC] Illumina annotation packages discrepancy

Mon Dec 1 18:44:29 CET 2008

The illuminaHumanv2.db package is not a "proprietary" package.  It is 
currently maintained by Mark Dunning (Mark.Dunning at cancer.org.uk).  It 
is based on BLASTed sequences but there was a problem in creating the 
package when more than one accession was assigned to a probe which 
caused the annotation program to skip all those probes which is why you 
are finding so many without annotation.  You should contact Mark to find 
out if that problem was corrected and a new version released.  You could 
also try using 2.2 release which I created and has annotation for all 
those probes.
Lynn

Renaud Gaujoux wrote:
> Hi Pan,
>
> thanks for your answer. I've been (and still am) struggling a bit to 
> get consistent and up to date annotation for my data.
>
> So, I guess it is more reliable to use the lumiHumanAll.db package?
>
> However, what about the probes that are note annotated in lumiHumanAll 
> but look like interesting for my study (i.e. appearing in my top lists 
> for differential expression or classification power).
> I've got such probes that are annotated neither packages 
> lumiHumanAll.db nor in lumiHumanV2 but are in illuminaHumanv2.
>
> Hence no package give me consistent annotation for my top genes. 
> However I've got an annotation file (that came with the array data, I 
> guess output by BeadStudio) that gives me annotations for all of my 
> probes. But as you mentioned, these might be outdated, which actually 
> bothers me. Any suggestion about that?
>
> By the way, how come that even Illumina "proprietary" packages 
> (illuminaHumanv2.db) don't annotate correctly their own probes? :(
>
> Thanks again for your help and clarification, and the lumi package.
>
> Renaud
>
>
> Pan Du wrote:
>> Hi Renaud,
>>
>> The reason of discrepancy is due to the different mapping criteria. Both
>> "lumiHumanAll.db" and "illuminaHumanv2.db" libraries are based on 
>> Blasting
>> result of RefSeq database. The "lumiHumanAll.db" library is nuID 
>> indexed and
>> includes all the probes of different versions. For the mapping from 
>> probe to
>> RefSeq, it defined both sensitivity and specificity (see the vignette
>> "IlluminaAnnotation.Rnw" in the lumi package). As a result, it might 
>> include
>> less mapping than "illuminaHumanv2.db" because "lumiHumanAll.db" 
>> filtered
>> out some dubious mappings (e.g., one probe has multiple perfect 
>> mapping.)
>>
>> The "lumiHumanV2" library was built based on the original annotation by
>> Illumina company. As a result, it has much more probe mappings. However,
>> many mappings might be outdated because of the updates of the genome
>> annotation.
>>
>> Hope this will clarify the confusion.
>>
>>
>> Pan
>>
>>
>> On 11/28/08 5:00 AM, "bioconductor-request at stat.math.ethz.ch"
>> <bioconductor-request at stat.math.ethz.ch> wrote:
>>
>>  
>>> Date: Thu, 27 Nov 2008 16:03:36 +0200
>>> From: Renaud Gaujoux <renaud at mancala.cbio.uct.ac.za>
>>> Subject: [BioC] Illumina annotation packages discrepancy
>>> To: bioconductor at stat.math.ethz.ch
>>> Message-ID: <492EA8B8.5000400 at cbio.uct.ac.za>
>>> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>>>
>>> Hi list,
>>>
>>> I've got BeadSummary data from Illumina (Array content:
>>> HUMANREF-8_V2_11223162_B.XML.xml).
>>> I imported it in R using the function lumi.batch.
>>> This automatically computed the nuID for each probe and set the
>>> annotation package to lumiHumanAll.db.
>>> This is all good.
>>>
>>> BUT, when I do
>>>
>>> lookUp(nuIDs, 'lumiHumanAll.db', 'GENENAME')
>>>
>>> I get 2921out of 20589 probes with NA.
>>>
>>> If I do the same using the old annotation package lumiHumanV2:
>>>
>>> lookUp(nuIDs, 'lumiHumanV2', 'GENENAME')
>>>
>>> I get 454 out of 20589 probes with NA.
>>>
>>> Finally, if I do the same using the annotation package
>>> illuminaHumanv2.db (but based on the corresponding TargetIDs):
>>>
>>> lookUp(targetIDs, 'illuminaHumanv2.db', 'GENENAME')
>>>
>>> I get 2041out of 20589 probes with NA.
>>>
>>> Can anybody give me an explanation for that discrepancy? And what
>>> annotation package I should use as it looks like some interesting 
>>> probes
>>> (for my experiment) don't have annotation in the new version?
>>>
>>> Also I could not find any reference to that HUMANREF-8_V2_11223162_B
>>> annotation (neither on Illumina website nor in Bioconductor 
>>> packages). I
>>> only found information about HUMANREF-8_V2_11223162_A. Is the letter
>>> suffix (A or B) really important?
>>>
>>> Thanks
>>>
>>>
>>>     
>>
>>
>> ------------------------------------------------------
>> Pan Du, PhD
>> Research Assistant Professor
>> Northwestern University Biomedical Informatics Center
>> 750 N. Lake Shore Drive, 11-176
>> Chicago, IL  60611
>> Office (312) 503-2360; Fax: (312) 503-5388
>> dupan (at) northwestern.edu
>> ------------------------------------------------------
>>  
>>
>>
>>
>>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: 
> http://news.gmane.org/gmane.science.biology.informatics.conductor