[BioC] annotations for Codelink arrays

Fri Oct 14 13:20:14 CEST 2005

Hi,

I build annotations for Codelink rat whole genome bioarrays quite  
regularly and have done at least once for human and mouse whole  
genome ones. I have used the gene list available in the GE Healthcare  
(Amersham) web page that contains mappings to Entrez gene and others.  
There are a number of issues that I have found that I could not  
resolve and prevent me to make it available:

In the gene list there is a field (PUB_PROBE_TARGETS) describing that  
a probe is SINGLE, DUPLICATE or MULTIPLE. The MULTIPLE one means that  
one probe in the array (30 nucleotides length) maps to more than one  
genbank sequence. So there could be no unique Entrez Gene  
correspondence. So I opted to put all Genbank accession numbers and  
no other mapping is provided for this probe (Although links obtained  
through htmlpage() in annotate package ease the looking to the  
different Genbank sequences).

There are also some probes named as CODELINK_UNIQUE (in the field for  
Genbank accession number) that don't have any mapping except for  
Unigene id's that could not be stable so it is not possible to use it  
on AnnBuilder (I think and also tried).

Finally there are probes named as COMPUGEN_UNIQUE (in the field for  
Genbank accession number) that also don't have any mapping but on the  
LEGACY_PROBE_NAME field that has something like a Genbank accession  
number with the tail _PROBE1. On this, the per script I use to  
extract Genbank accession numbers take this "mapping".

This issues may be important because, for example, there are 693  
probes named as MULTIPLE in the Rat Whole Genome (and may be  
increased when the company gene list update it). 622 probes are  
CODELINK_UNIQUE and 25 to COMPUGEN_UNIQUE. That makes more that 1300  
probes that accounts for near 4% of the probes.

1) In the case of MULTIPLE probes: Can AnnBuilder find when a  
coherent mapping for different Genbank Accession numbers to Entrez  
Gene exists and then use this mapping? or when it find two Genbank  
acc. associated to one probe it avoids mapping at all?

2) For the CODELINK_UNIQUE: Until we can get the mappings to Genbank  
acc. Is there any possibility to use the mappings to Unigene?.

Thanks.

D.

El 13/10/2005, a las 3:14, Robert Gentleman escribió:

> Hi Tao,
>   If the right set of mappings is available to get started, AnnBuilder
> is pretty easy to use. We can help you with the first one or two, and
> are happy to distribute them. If there is more widespread interest  
> (and
> they are stable) we can add them to the build process.
>
>   Robert
>
> Shi, Tao wrote:
>
>> Any plans to create annotation packages for Codelink arrays?
>>
>> ...Tao
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>
>>
>
> -- 
> Robert Gentleman, PhD
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M2-B876
> PO Box 19024
> Seattle, Washington 98109-1024
> 206-667-7700
> rgentlem at fhcrc.org
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
>