[BioC] annotations for Codelink arrays
Diego Díez Ruiz
ddiez at iib.uam.es
Fri Oct 14 13:20:14 CEST 2005
I build annotations for Codelink rat whole genome bioarrays quite
regularly and have done at least once for human and mouse whole
genome ones. I have used the gene list available in the GE Healthcare
(Amersham) web page that contains mappings to Entrez gene and others.
There are a number of issues that I have found that I could not
resolve and prevent me to make it available:
In the gene list there is a field (PUB_PROBE_TARGETS) describing that
a probe is SINGLE, DUPLICATE or MULTIPLE. The MULTIPLE one means that
one probe in the array (30 nucleotides length) maps to more than one
genbank sequence. So there could be no unique Entrez Gene
correspondence. So I opted to put all Genbank accession numbers and
no other mapping is provided for this probe (Although links obtained
through htmlpage() in annotate package ease the looking to the
different Genbank sequences).
There are also some probes named as CODELINK_UNIQUE (in the field for
Genbank accession number) that don't have any mapping except for
Unigene id's that could not be stable so it is not possible to use it
on AnnBuilder (I think and also tried).
Finally there are probes named as COMPUGEN_UNIQUE (in the field for
Genbank accession number) that also don't have any mapping but on the
LEGACY_PROBE_NAME field that has something like a Genbank accession
number with the tail _PROBE1. On this, the per script I use to
extract Genbank accession numbers take this "mapping".
This issues may be important because, for example, there are 693
probes named as MULTIPLE in the Rat Whole Genome (and may be
increased when the company gene list update it). 622 probes are
CODELINK_UNIQUE and 25 to COMPUGEN_UNIQUE. That makes more that 1300
probes that accounts for near 4% of the probes.
1) In the case of MULTIPLE probes: Can AnnBuilder find when a
coherent mapping for different Genbank Accession numbers to Entrez
Gene exists and then use this mapping? or when it find two Genbank
acc. associated to one probe it avoids mapping at all?
2) For the CODELINK_UNIQUE: Until we can get the mappings to Genbank
acc. Is there any possibility to use the mappings to Unigene?.
El 13/10/2005, a las 3:14, Robert Gentleman escribió:
> Hi Tao,
> If the right set of mappings is available to get started, AnnBuilder
> is pretty easy to use. We can help you with the first one or two, and
> are happy to distribute them. If there is more widespread interest
> they are stable) we can add them to the build process.
> Shi, Tao wrote:
>> Any plans to create annotation packages for Codelink arrays?
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
> Robert Gentleman, PhD
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M2-B876
> PO Box 19024
> Seattle, Washington 98109-1024
> rgentlem at fhcrc.org
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
More information about the Bioconductor