[BioC] Annotations dealing with "removed" refseq record

nli at fhcrc.org nli at fhcrc.org
Fri Jun 8 23:22:15 CEST 2007


Hi, Francois,

If I remember correctly, we had a hard time finding up-to-date annotations from
Agilent. The annotation file we downloaded from Agilent was out-of-date. We
still update the annotation packages for each release, but probeset to gene
mapping (recorded in mgug4122aACCNUM) hasn't been updated for quite a long
time. In another word, we only update the annotations for the genes. So, if
mgug4122aACCNUM is wrong/deprecated for a probeset, then other annotations for
this probeset will be incorrect.

Could you please post the link to the up-to-date annotation file? We can
re-build the annotation packages base on them. Your help will be highly
appreciated.

thanks

nianhua

Quoting Francois Pepin <fpepin at cs.mcgill.ca>:

> Hi,
> 
> I think the annotation system has problems dealing with RefSeq that were
> removed.
> 
> This is looking at the Erbb2 gene in mouse (entrezID=13866) on the whole
> genome mouse chip from Agilent (annotation package: mgug4122a). From the
> annotations provided by Agilent, there are 2 probes that map to it:
> A_52_P49250 and A_51_P216179.
> 
> Currently, the annotations do not give any results for it:
> 
> > library(mgug4122a)
> > unlist(mget(c('A_52_P49250','A_51_P216179'),mgug4122aSYMBOL))
>  A_52_P49250 A_51_P216179
>           NA           NA
> 
> The accession number that is given indeed points to NM_010152.
> 
> > unlist(mget(c('A_52_P49250','A_51_P216179'),mgug4122aACCNUM))
>  A_52_P49250 A_51_P216179
>  "NM_010152"  "NM_010152"
> 
> Looking at it on the NCBI website, it does point to Erbb2, but it also
> says: "This record was removed by RefSeq staff".
> 
> Not being entirely familiar with the process, I would point to this as a
> likely reason for the lack of annotations for those two probes.
> 
> I have not done an extensive check between the Agilent annotation and
> the ones in mgug4122a to see how many other probes might be hit by this.
> 
> > sessionInfo()
> R version 2.5.0 (2007-04-23)
> x86_64-unknown-linux-gnu
> 
> locale:
> LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;
> LC_COLLATE=en_US.UTF-8;LC_MONETARY=en_US.UTF-8;
> LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;
> LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;
> LC_IDENTIFICATION=C
> 
> attached base packages:
> [1] "stats"     "graphics"  "grDevices" "utils"     "datasets"  
> [6] "methods" "base"
> 
> other attached packages:
> mgug4122a
>  "1.16.0"
> 
> If there is any more information I can provide, please tell
> me.             
> 
> Francois
> 
>



More information about the Bioconductor mailing list