[BioC] ragene10st

Tue Mar 3 19:52:46 CET 2009

Hi Sebastien,
To follow-up and clarify on Manhong remarks:
Philip, my collegue, prepared the annotation files for many of the
Entrez-based remapped CDF files. 
The remapping of the probes has been done by Manhong et al @ the MBNI,
and the mapped Entrez IDs are then used by Philip to create the
corresponding annotation files (using the annotation/SQLForge library),
that are made available trough the link you mentioned below.

HTH,
Guido

> -----Original Message-----
> From: bioconductor-bounces at stat.math.ethz.ch 
> [mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of 
> Manhong Dai
> Sent: 03 March 2009 15:36
> To: bioconductor at stat.math.ethz.ch
> Subject: Re: [BioC] ragene10st
> 
> Hi Sebastien,
> 
> 
> 	Custom CDF version 11 is at
> http://brainarray.mbni.med.umich.edu/Brainarray/Database/Custo
> mCDF/CDF_download.asp#v11
> 
> 	If you prefer entrez gene based cdf, it is at 
> http://brainarray.mbni.med.umich.edu/Brainarray/Database/Custo
> mCDF/11.0.1/entrezg.asp then search RaGene10stv1 in the page.
> 
> 
> 	In custom CDF entrezg, the probeset id is already 
> entrez gene. That's why you saw the probeset ID in NUGO 
> Custom CDF version 10 annotation package is not the same as 
> the probeset id in affy's original custom CDF file.
> 
> 
> Best,
> Manhong
> 
> > Date: Tue, 03 Mar 2009 16:08:33 +1100
> > From: Sebastien Gerega <seb at gerega.net>
> > Subject: Re: [BioC] ragene10st
> > To: bioconductor at stat.math.ethz.ch
> > Message-ID: <49ACBB51.8070904 at gerega.net>
> > Content-Type: text/plain; charset=ISO-8859-1; format=flowed
> > 
> > Thank you Marc and Manhong for your suggestions.
> > I have attempted both methods and run into some problems. 
> Firstly, I 
> > was able to build ragene10st.db using the following code:
> > 
> > source("http://bioconductor.org/biocLite.R")
> > biocLite("rat.db0")
> > 
> > library(AnnotationDbi)
> > fname = "RaGene-1_0-st-v1.EDITED.txt"
> > wdir = getwd()   
> > makeRATCHIP_DB(affy=FALSE,
> >     prefix="ragene10st",
> >     fileName=fname,
> >     baseMapType="eg",
> >     outputDir = wdir,
> >     version="1.0.0",
> >     manufacturer = "Affymetrix",
> >     chipName = "Rat Gene ST Array",
> >     manufacturerUrl = "http://www.affymetrix.com")
> > 
> > I then used this library for annotation of an analysis I 
> performed. At 
> > this point I realised that about one third of the 29171 probes were 
> > assigned the gene symbol "RT1-C113". I realise this is due to the 
> > annotation file used being in the wrong format. I had used the 
> > "mrna_assignment" column which contains data appearing in a complex 
> > format. Here are a couple examples:
> > NM_001099458 // RefSeq // Rattus norvegicus similar to putative 
> > pheromone receptor (RGD1564110), mRNA. // chr1 // 49 // 74 
> // 19 // 39 
> > // 0 ///
> > ENSRNOT00000046204 // Rn.217623 // ---
> > NM_001099461 // Rn.217622 // --- /// NM_001099461 // 
> Rn.217622 // --- 
> > /// ENSRNOT00000041455 // Rn.217622 // --- /// ENSRNOT00000046204 //
> > Rn.217623 // ---
> > 
> > Unfortunately for the Gene ST chips there are no columns 
> that simply 
> > contain genbank, unigene, or refseq IDs.
> > 
> > So instead I tried Manhong's suggestion of using a custom CDF but 
> > there is no custom CDF for rat gene ST arrays on the 
> > http://brainarray.mbni.med.umich.edu/ website. However, if I follow 
> > the link to http://nugo-r.bioinformatics.nl/NuGO_R.html I 
> am able to 
> > locate an appropriate CDF. Unfortunately, upon further 
> examination of 
> > this CDF package it appears as though the wrong probe IDs 
> have been used.
> > For example:
> >  > as.list(ragene10stv1rnentrezgSYMBOL)[1:5]
> > $`112400_at`
> > [1] "Nrg1"
> > 
> > $`113882_at`
> > [1] "Hemgn"
> > 
> > $`113886_at`
> > [1] "Kif1c"
> > 
> > $`113892_at`
> > [1] "Cml3"
> > 
> > As far as I am aware the probe IDs used for rat gene ST 
> arrays are in 
> > the following format (8 digits without "_at"):
> > 10700001
> > 10700003
> > 10700004
> > 10700005
> > 10700013
> > 
> > Can anyone provide any advice for either of the two options?
> > thanks,
> > Sebastien
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: 
> http://news.gmane.org/gmane.science.biology.informatics.conductor
> 
>