seb at gerega.net
Thu Mar 5 23:30:09 CET 2009
Thanks to all those that offered advice. I have now managed to create an
annotation file for rat gene ST arrays. It can be downloaded from:
in case anyone else is interested in using it.
Hooiveld, Guido wrote:
> Hi Sebastien,
> To follow-up and clarify on Manhong remarks:
> Philip, my collegue, prepared the annotation files for many of the
> Entrez-based remapped CDF files.
> The remapping of the probes has been done by Manhong et al @ the MBNI,
> and the mapped Entrez IDs are then used by Philip to create the
> corresponding annotation files (using the annotation/SQLForge library),
> that are made available trough the link you mentioned below.
>> -----Original Message-----
>> From: bioconductor-bounces at stat.math.ethz.ch
>> [mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of
>> Manhong Dai
>> Sent: 03 March 2009 15:36
>> To: bioconductor at stat.math.ethz.ch
>> Subject: Re: [BioC] ragene10st
>> Hi Sebastien,
>> Custom CDF version 11 is at
>> If you prefer entrez gene based cdf, it is at
>> mCDF/11.0.1/entrezg.asp then search RaGene10stv1 in the page.
>> In custom CDF entrezg, the probeset id is already
>> entrez gene. That's why you saw the probeset ID in NUGO
>> Custom CDF version 10 annotation package is not the same as
>> the probeset id in affy's original custom CDF file.
>>> Date: Tue, 03 Mar 2009 16:08:33 +1100
>>> From: Sebastien Gerega <seb at gerega.net>
>>> Subject: Re: [BioC] ragene10st
>>> To: bioconductor at stat.math.ethz.ch
>>> Message-ID: <49ACBB51.8070904 at gerega.net>
>>> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>>> Thank you Marc and Manhong for your suggestions.
>>> I have attempted both methods and run into some problems.
>> Firstly, I
>>> was able to build ragene10st.db using the following code:
>>> fname = "RaGene-1_0-st-v1.EDITED.txt"
>>> wdir = getwd()
>>> outputDir = wdir,
>>> manufacturer = "Affymetrix",
>>> chipName = "Rat Gene ST Array",
>>> manufacturerUrl = "http://www.affymetrix.com")
>>> I then used this library for annotation of an analysis I
>> performed. At
>>> this point I realised that about one third of the 29171 probes were
>>> assigned the gene symbol "RT1-C113". I realise this is due to the
>>> annotation file used being in the wrong format. I had used the
>>> "mrna_assignment" column which contains data appearing in a complex
>>> format. Here are a couple examples:
>>> NM_001099458 // RefSeq // Rattus norvegicus similar to putative
>>> pheromone receptor (RGD1564110), mRNA. // chr1 // 49 // 74
>> // 19 // 39
>>> // 0 ///
>>> ENSRNOT00000046204 // Rn.217623 // ---
>>> NM_001099461 // Rn.217622 // --- /// NM_001099461 //
>> Rn.217622 // ---
>>> /// ENSRNOT00000041455 // Rn.217622 // --- /// ENSRNOT00000046204 //
>>> Rn.217623 // ---
>>> Unfortunately for the Gene ST chips there are no columns
>> that simply
>>> contain genbank, unigene, or refseq IDs.
>>> So instead I tried Manhong's suggestion of using a custom CDF but
>>> there is no custom CDF for rat gene ST arrays on the
>>> http://brainarray.mbni.med.umich.edu/ website. However, if I follow
>>> the link to http://nugo-r.bioinformatics.nl/NuGO_R.html I
>> am able to
>>> locate an appropriate CDF. Unfortunately, upon further
>> examination of
>>> this CDF package it appears as though the wrong probe IDs
>> have been used.
>>> For example:
>>> > as.list(ragene10stv1rnentrezgSYMBOL)[1:5]
>>>  "Nrg1"
>>>  "Hemgn"
>>>  "Kif1c"
>>>  "Cml3"
>>> As far as I am aware the probe IDs used for rat gene ST
>> arrays are in
>>> the following format (8 digits without "_at"):
>>> Can anyone provide any advice for either of the two options?
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> Search the archives:
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor