[BioC] Annotations dealing with "removed" refseq record

Francois Pepin fpepin at cs.mcgill.ca
Fri Jun 8 22:31:30 CEST 2007


Hi,

I think the annotation system has problems dealing with RefSeq that were
removed.

This is looking at the Erbb2 gene in mouse (entrezID=13866) on the whole
genome mouse chip from Agilent (annotation package: mgug4122a). From the
annotations provided by Agilent, there are 2 probes that map to it:
A_52_P49250 and A_51_P216179.

Currently, the annotations do not give any results for it:

> library(mgug4122a)
> unlist(mget(c('A_52_P49250','A_51_P216179'),mgug4122aSYMBOL))
 A_52_P49250 A_51_P216179
          NA           NA

The accession number that is given indeed points to NM_010152.

> unlist(mget(c('A_52_P49250','A_51_P216179'),mgug4122aACCNUM))
 A_52_P49250 A_51_P216179
 "NM_010152"  "NM_010152"

Looking at it on the NCBI website, it does point to Erbb2, but it also
says: "This record was removed by RefSeq staff".

Not being entirely familiar with the process, I would point to this as a
likely reason for the lack of annotations for those two probes.

I have not done an extensive check between the Agilent annotation and
the ones in mgug4122a to see how many other probes might be hit by this.

> sessionInfo()
R version 2.5.0 (2007-04-23)
x86_64-unknown-linux-gnu

locale:
LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;
LC_COLLATE=en_US.UTF-8;LC_MONETARY=en_US.UTF-8;
LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;
LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;
LC_IDENTIFICATION=C

attached base packages:
[1] "stats"     "graphics"  "grDevices" "utils"     "datasets"  
[6] "methods" "base"

other attached packages:
mgug4122a
 "1.16.0"

If there is any more information I can provide, please tell
me.             

Francois



More information about the Bioconductor mailing list