[BioC] ragene10st

Sebastien Gerega seb at gerega.net
Tue Mar 3 06:08:33 CET 2009

Thank you Marc and Manhong for your suggestions.
I have attempted both methods and run into some problems. Firstly, I was 
able to build ragene10st.db using the following code:


fname = "RaGene-1_0-st-v1.EDITED.txt"
wdir = getwd()   
    outputDir = wdir,
    manufacturer = "Affymetrix",
    chipName = "Rat Gene ST Array",
    manufacturerUrl = "http://www.affymetrix.com")

I then used this library for annotation of an analysis I performed. At 
this point I realised that about one third of the 29171 probes were 
assigned the gene symbol "RT1-C113". I realise this is due to the 
annotation file used being in the wrong format. I had used the 
"mrna_assignment" column which contains data appearing in a complex 
format. Here are a couple examples:
NM_001099458 // RefSeq // Rattus norvegicus similar to putative 
pheromone receptor (RGD1564110), mRNA. // chr1 // 49 // 74 // 19 // 39 
// 0 ///
ENSRNOT00000046204 // Rn.217623 // ---
NM_001099461 // Rn.217622 // --- /// NM_001099461 // Rn.217622 // --- 
/// ENSRNOT00000041455 // Rn.217622 // --- /// ENSRNOT00000046204 // 
Rn.217623 // ---

Unfortunately for the Gene ST chips there are no columns that simply 
contain genbank, unigene, or refseq IDs.

So instead I tried Manhong's suggestion of using a custom CDF but there 
is no custom CDF for rat gene ST arrays on the 
http://brainarray.mbni.med.umich.edu/ website. However, if I follow the 
link to http://nugo-r.bioinformatics.nl/NuGO_R.html I am able to locate 
an appropriate CDF. Unfortunately, upon further examination of this CDF 
package it appears as though the wrong probe IDs have been used.
For example:
 > as.list(ragene10stv1rnentrezgSYMBOL)[1:5]
[1] "Nrg1"

[1] "Hemgn"

[1] "Kif1c"

[1] "Cml3"

As far as I am aware the probe IDs used for rat gene ST arrays are in 
the following format (8 digits without "_at"):

Can anyone provide any advice for either of the two options?

Marc Carlson wrote:
> Well one way is to navigate Affymetrix's website and grab the annotation
> file
> http://www.affymetrix.com/support/technical/annotationfilesmain.affx
> Or you could also use Martin Morgans clever AffyCompatible package which
> will let you get the data you need more directly.
> ##The 2nd approach would go something like this (adapting from Martins
> Vignette):
> library(AffyCompatible)
> password <- "your_psswd"
> rsrc <- NetAffxResource(user="you at someplace.com", password=password)
> head(names(rsrc))
> affxDescription(rsrc[["RaGene-1_0-st-v1"]])
> annos <- rsrc[["RaGene-1_0-st-v1"]]
> annos
> sapply(affxAnnotation(annos), force)
> anno <- rsrc[["RaGene-1_0-st-v1", "Probeset Annotations, CSV Format"]]
> fl <- readAnnotation(rsrc, annotation=anno, content=FALSE)
> fl
> conn <- unz(fl, "RaGene-1_0-st-v1.na27.2.rn4.probeset.csv")
> ##Then get a dataframe with the contents of the file in it
> df = read.table(conn, header=TRUE, skip=18, sep=",")
>   Marc
> Sebastien Gerega wrote:
>> Hi Marc,
>> I guess the problem lies in the fact that I don't know which
>> Annotation file to use. I can't seem to find any that have the
>> appropriate columns. What files were used to generate mogene10st.db
>> and hugene10st.db ? I can find appropriate annotations for Affy 3'
>> arrays but not for the Gene St ones....
>> thanks again,
>> Sebastien
>> Marc Carlson wrote:
>>> Hi Sebastien,
>>> The affy parameter is just a shortcut for affymetrix expression
>>> arrays. If you want to use that parameter, then you can download the
>>> appropriate
>>> annotation library file from Affymetrix website (which you probably have
>>> to get anyhow), just point to it in the parameter and then call the
>>> function.  What SQLforge will then try to do is to parse this file by
>>> removing from it only the probeset IDs and the entrez gene, refseq IDs
>>> and unigene IDs from the file in order to sort out what all these genes
>>> are and thus generate the files that are described in the vignette from
>>> this affymetrix file.  This will work as long as this particular
>>> annotation file is formatted similarly to what has come before.   But,
>>> really this parameter is purely for convenience and not at all necessary
>>> to using SQLForge.  A lot of people use affy, so I just added this to
>>> make it easier for that majority of users.
>>> You almost as easily  can just grab that same Affymetrix annotation
>>> library file and make the tab delimited files that I described
>>> yourself.  All you really need is a file that tells the gene identity of
>>> the different probesets.  So you can ignore the vast majority of the
>>> data in the file.  If you have that, then you have all that you really
>>> need to proceed.  For most platforms this just means selecting out tow
>>> of the columns and then creating a tab file from those.  Then you have
>>> to feed such a file to your function.
>>> Please let me know if you have more questions,
>>>   Marc
>>> Sebastien Gerega wrote:
>>>> Hi Marc and thanks for your help. I've had a look at the SQLForge
>>>> vignette and there are still a couple issues that are unclear to me.
>>>> Firstly, for the Rat Gene ST arrays is it possible to use any of the
>>>> annotation files from the Affymetrix site as input for makeRATCHIP_DB
>>>> in AnnotationDbi? If not, and the list of probes has to be manually
>>>> created what is the best way to go about doing this?
>>>> thanks again,
>>>> Sebastien
>>>> Marc Carlson wrote:
>>>>> Hi Sebastien,
>>>>> We have just never had anyone ask for one before.  However, you can
>>>>> make
>>>>> a package for yourself if you follow the instructions in the SQLForge
>>>>> vignette in the AnnotationDbi package:
>>>>> http://www.bioconductor.org/packages/devel/bioc/html/AnnotationDbi.html
>>>>> Please let me know if you have further questions regarding this.
>>>>>   Marc
>>>>> Sebastien Gerega wrote:
>>>>>> Hi,
>>>>>> I have been analysing human and mouse gene ST chips using a
>>>>>> combination of the Aroma package and the hugene10st.db and
>>>>>> mogene10st.db annotation packages. Now I am attempting to perform the
>>>>>> same on some rat gene ST chips but have unable to find the
>>>>>> corresponding annotations. Why is there no ragene10st?
>>>>>> thanks,
>>>>>> Sebastien
>>>>>> _______________________________________________
>>>>>> Bioconductor mailing list
>>>>>> Bioconductor at stat.math.ethz.ch
>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>>> Search the archives:
>>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor

More information about the Bioconductor mailing list