[BioC] ragene10st

Thu Mar 5 16:20:27 CET 2009

Hi Sebastien,

	I CCed this email to bioc because actually this question is very common
question for new custom CDF users.

	Our custom CDF doesn't use Affy's original probeset ID, instead, we
only use individual probe, and organize them into meaningful probeset.
Probeset in custom CDF is already entrez gene, ensg, ense or refseq,
etc. So annotation package for custom CDF is not as important as it used
to be. In most of my own analyses, I didn't even use annotation
packages.

	The major reason we release custom CDF is custom CDF discards probes
that are proved to be wrong with the latest gene definition. Moreover,
there are two additional major benefits for bioconductor users, At first
it provides a direct mapping between probe and
gene/exon/transcript/ref/ug. Secondly, user can analyze many new chips
(gene chip, exon chip and tiling chip)  with the traditional way (rma,
dchip, etc.).

	For your case, our custom CDF can only help your analysis starting from
celfile. If you just want to get annotation for affy's original
probeset, you have to stick to Marc's suggestion.

Best,
Manhong

On Thu, 2009-03-05 at 15:45 +1100, Sebastien Gerega wrote:
> Hi Manhong,
> thank you for your help. I now understand that the probe IDs are 
> actually the Entrez IDs with "_at" pasted onto the end of the file. 
> However, given that I only have the Affy probe IDs - as in the orginal 
> ones in the form of:
> 
> 10700001
> 10700003
> 10700004
> 10700005
> 10700013
> 
> how can I use the annotation package? For example given the affy ID 
> "10700001" how can I obtain the Entrez ID and additional annotations?
> thanks for any advice you can offer!
> regards,
> Sebastien
> 
> Manhong Dai wrote:
> > Hi Sebastien,
> >
> >
> > 	Custom CDF version 11 is at
> > http://brainarray.mbni.med.umich.edu/Brainarray/Database/CustomCDF/CDF_download.asp#v11
> >
> > 	If you prefer entrez gene based cdf, it is at
> > http://brainarray.mbni.med.umich.edu/Brainarray/Database/CustomCDF/11.0.1/entrezg.asp then search RaGene10stv1 in the page.
> >
> >
> > 	In custom CDF entrezg, the probeset id is already entrez gene. That's
> > why you saw the probeset ID in NUGO Custom CDF version 10 annotation
> > package is not the same as the probeset id in affy's original custom CDF
> > file.
> >
> >
> > Best,
> > Manhong
> >
> >   
> >> Date: Tue, 03 Mar 2009 16:08:33 +1100
> >> From: Sebastien Gerega <seb at gerega.net>
> >> Subject: Re: [BioC] ragene10st
> >> To: bioconductor at stat.math.ethz.ch
> >> Message-ID: <49ACBB51.8070904 at gerega.net>
> >> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
> >>
> >> Thank you Marc and Manhong for your suggestions.
> >> I have attempted both methods and run into some problems. Firstly, I was 
> >> able to build ragene10st.db using the following code:
> >>
> >> source("http://bioconductor.org/biocLite.R")
> >> biocLite("rat.db0")
> >>
> >> library(AnnotationDbi)
> >> fname = "RaGene-1_0-st-v1.EDITED.txt"
> >> wdir = getwd()   
> >> makeRATCHIP_DB(affy=FALSE,
> >>     prefix="ragene10st",
> >>     fileName=fname,
> >>     baseMapType="eg",
> >>     outputDir = wdir,
> >>     version="1.0.0",
> >>     manufacturer = "Affymetrix",
> >>     chipName = "Rat Gene ST Array",
> >>     manufacturerUrl = "http://www.affymetrix.com")
> >>
> >> I then used this library for annotation of an analysis I performed. At 
> >> this point I realised that about one third of the 29171 probes were 
> >> assigned the gene symbol "RT1-C113". I realise this is due to the 
> >> annotation file used being in the wrong format. I had used the 
> >> "mrna_assignment" column which contains data appearing in a complex 
> >> format. Here are a couple examples:
> >> NM_001099458 // RefSeq // Rattus norvegicus similar to putative 
> >> pheromone receptor (RGD1564110), mRNA. // chr1 // 49 // 74 // 19 // 39 
> >> // 0 ///
> >> ENSRNOT00000046204 // Rn.217623 // ---
> >> NM_001099461 // Rn.217622 // --- /// NM_001099461 // Rn.217622 // --- 
> >> /// ENSRNOT00000041455 // Rn.217622 // --- /// ENSRNOT00000046204 // 
> >> Rn.217623 // ---
> >>
> >> Unfortunately for the Gene ST chips there are no columns that simply 
> >> contain genbank, unigene, or refseq IDs.
> >>
> >> So instead I tried Manhong's suggestion of using a custom CDF but there 
> >> is no custom CDF for rat gene ST arrays on the 
> >> http://brainarray.mbni.med.umich.edu/ website. However, if I follow the 
> >> link to http://nugo-r.bioinformatics.nl/NuGO_R.html I am able to locate 
> >> an appropriate CDF. Unfortunately, upon further examination of this CDF 
> >> package it appears as though the wrong probe IDs have been used.
> >> For example:
> >>  > as.list(ragene10stv1rnentrezgSYMBOL)[1:5]
> >> $`112400_at`
> >> [1] "Nrg1"
> >>
> >> $`113882_at`
> >> [1] "Hemgn"
> >>
> >> $`113886_at`
> >> [1] "Kif1c"
> >>
> >> $`113892_at`
> >> [1] "Cml3"
> >>
> >> As far as I am aware the probe IDs used for rat gene ST arrays are in 
> >> the following format (8 digits without "_at"):
> >> 10700001
> >> 10700003
> >> 10700004
> >> 10700005
> >> 10700013
> >>
> >> Can anyone provide any advice for either of the two options?
> >> thanks,
> >> Sebastien
> >>     
> >
> >   
>