[BioC] HuGene annotation and htmls

cstrato cstrato at aon.at
Thu Apr 16 20:25:45 CEST 2009


Dear Marc

Please allow me to make some additional comments:
As I have mentioned earlier, with release 4 (r4) Affymetrix has 
converted HuGene, MoGene and RaGene arrays to exon arrays. For exon 
arrays Affymetrix supplies both transcript.csv and probeset.csv 
annotation files. Thus Affymetrix has not stopped releasing the 
transcript.csv annotation files but has added the probeset.csv 
annotation files for these arrays, allowing both 
"transcript_cluster_id"s and "probeset_id"s as identifiers.

The most important change was made to the *.PGF files:
While the probes for e.g. "HuGene-1_0-st-v1.r3.pgf" (and 
"HuGene-1_0-st-v1.r3.cdf") are grouped according to the 
"transcript_cluster_id"s, the probes for "HuGene-1_0-st-v1.r4.pgf" are 
grouped according to the "probeset_id"s of the new probeset annotation 
file "HuGene-1_0-st-v1.na28.hg18.probeset.csv".

Best regards
Christian


Marc Carlson wrote:
> Hi guys,
>
> So something confusing has happened with the Hugene, Mogene and Ragene
> platforms.  With revision 4 of these platforms, Affymetrix has decided
> to stop releasing the transcript.csv file which identifies the
> relationships between the transcript cluster IDs and the genes
> represented on the platform, and has switched to releasing a
> probeset.csv file instead which relates probeset IDs to genes.  This
> results in a massive expansion in the number of the identifiers used for
> this platform.  So in order to have things make more sense we have now
> forked this package in the development branch so that there is now a
> "transcriptcluster" package (based on version 3) version and a
> "probeset" (version 4) version based on how you happen to need the probe
> identifiers to be arranged.  And of course, if you need yet another
> mapping, you can always make a new package as needed using the SQLForge
> code in AnnotationDbi.
>
>
>   Marc
>
>
>
> cstrato wrote:
>   
>> Dear Mayte
>>
>> Everything is fine with your code, nothing to worry about.
>>
>> If you look at column "gene_assignment" of
>> "HuGene-1_0-st-v1.na28.hg18.transcript.csv" you will see many NAs, e.g.:
>>
>>     
>>> getSYMBOL("7896740", "hugene10st")
>>>       
>> 7896740
>> "OR4F17"
>>     
>>> getSYMBOL("7896746", "hugene10st")
>>>       
>> 7896746
>>     NA
>>
>> Best regards
>> Christian
>>
>>
>> Mayte Suarez-Farinas wrote:
>>     
>>> You are right James!!!
>>> with the keys James  sent the package hugene10st  work just fine.
>>> so it looks like the "error" come from my use of xps.
>>>
>>> here is my code:
>>>
>>> library(xps)
>>>
>>> ### define directories:
>>> # directory containing Affymetrix library files
>>> libdir <- "/Users/Mayte/Rlibrary/AffyDB/libraryfiles"
>>> anndir <- "/Users/Mayte/Rlibrary/AffyDB/Annotation"
>>> scmdir <- "/Users/Mayte/Rlibrary/AffyDB/ROOTSchemes"
>>>
>>> scheme.hugene10stv1r4.na28 <- import.exon.scheme
>>> ("Scheme_HuGene10stv1r4_na28",filedir=scmdir,
>>>                               
>>> layoutfile=paste(libdir,"HuGene-1_0-st- v1.r4.clf",sep="/"),
>>>                               
>>> schemefile=paste(libdir,"HuGene-1_0-st- v1.r4.pgf",sep="/"),
>>>                                probeset=paste(anndir,"HuGene-1_0-st-
>>> v1.na28.hg18.probeset.csv",sep="/"),
>>>                               
>>> transcript=paste(anndir,"HuGene-1_0-st-
>>> v1.na28.hg18.transcript.csv",sep="/"))
>>>
>>> scheme.hugene10stv1r4 <- root.scheme(paste(scmdir, 
>>> "Scheme_HuGene10stv1r4_na28.root",sep = "/"))
>>> G1ST_data<-import.data(scheme.hugene10stv1r4, "Pamela_G1ST_dataxps", 
>>> celdir=getwd(), celfiles = as.character(PD[1:8,'CELfile']), verbose
>>> =  FALSE)
>>> G1ST_rma_xps <- rma(G1ST_data, "Pamela_G1ST_rma_t", 
>>> background="antigenomic", option="transcript",
>>> exonlevel="core+affx",  normalize=T)
>>>
>>> The "featureNames" of the data (or keys) can be  taken as:
>>>
>>> keys<-as.character(exprs(G1ST_rma_xps)$UnitName)
>>>
>>> but almost half them do not have symbol:
>>>
>>> sum(!is.na(getSYMBOL(keys, "hugene10st")))
>>> [1] 19899
>>> sum(is.na(getSYMBOL(keys, "hugene10st")))
>>>   9027
>>>
>>> Is this OK ? or is there any mistake in my code??
>>>
>>> Thanks in advance for everybody help!!!
>>> and sorry for bothering so many times!
>>>
>>> Mayte
>>>
>>> On Apr 10, 2009, at 10:55 AM, James W. MacDonald wrote:
>>>
>>>  
>>>       
>>>> I wonder if this is a problem with how the package was built. The 
>>>> numbers that Marc supplied are the Exon Probeset IDs, but the Lkeys 
>>>> of the hugene10st.db package seem to be what Affy calls the 
>>>> Transcript Cluster ID.
>>>>
>>>>    
>>>>         
>>>>> keys <- c("7903188","7903203")
>>>>> getSYMBOL(keys, "hugene10st")
>>>>>       
>>>>>           
>>>> 7903188 7903203
>>>> "PTBP2"  "SNX7"
>>>>
>>>> Best,
>>>>
>>>> Jim
>>>>
>>>>
>>>>
>>>> Mayte Suarez-Farinas wrote:
>>>>    
>>>>         
>>>>> I meant that the usual functions from annotate does not work.
>>>>> When I ran your code, I get:
>>>>> library("annotate")
>>>>>  > library("hugene10st.db")
>>>>>  > keys = c("7903193","7903204")
>>>>>  >
>>>>>  > getSYMBOL(keys, "hugene10st")
>>>>> 7903193 7903204
>>>>>      NA      NA
>>>>>  >
>>>>>  > lookUp(keys, "hugene10st" , "CHR")
>>>>> $`7903193`
>>>>> [1] NA
>>>>> $`7903204`
>>>>> [1] NA
>>>>>  > lookUp(keys, "hugene10st" , "ENTREZID")
>>>>> $`7903193`
>>>>> [1] NA
>>>>> $`7903204`
>>>>> [1] NA
>>>>> sessionInfo()
>>>>> R version 2.8.1 (2008-12-22)
>>>>> i386-apple-darwin8.11.1
>>>>> locale:
>>>>> en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
>>>>> attached base packages:
>>>>> [1] splines   tools     stats     graphics  grDevices utils     
>>>>> datasets  methods   base
>>>>> other attached packages:
>>>>>  [1] hugene10st.db_1.0.2  statmod_1.3.8        
>>>>> beadarray_1.10.0     sma_0.5.15           hwriter_1.0
>>>>>  [6] affycoretools_1.14.1 annaffy_1.14.0       
>>>>> KEGG.db_2.2.5        biomaRt_1.16.0       GOstats_2.8.0
>>>>> [11] Category_2.8.4       RBGL_1.18.0          
>>>>> GO.db_2.2.5          RSQLite_0.7-1        DBI_0.2-4
>>>>> [16] graph_1.20.0         limma_2.16.4         
>>>>> affyQCReport_1.20.0  geneplotter_1.20.0   annotate_1.20.1
>>>>> [21] AnnotationDbi_1.5.18 lattice_0.17-17      
>>>>> RColorBrewer_1.0-2   affyPLM_1.18.1       preprocessCore_1.4.0
>>>>> [26] xtable_1.5-4         simpleaffy_2.18.0    
>>>>> gcrma_2.14.1         matchprobes_1.14.1   genefilter_1.22.0
>>>>> [31] survival_2.34-1      affy_1.20.2          Biobase_2.2.2
>>>>> loaded via a namespace (and not attached):
>>>>> [1] GSEABase_1.4.0     KernSmooth_2.22-22 RCurl_0.94-1       
>>>>> XML_2.1-0          affyio_1.10.1
>>>>> [6] cluster_1.11.11    grid_2.8.1         xps_1.2.8
>>>>> On Apr 9, 2009, at 5:26 PM, Marc Carlson wrote:
>>>>>      
>>>>>           
>>>>>> Hi Mayte,
>>>>>>
>>>>>> I can't tell from your post what you tried to do, or even what 
>>>>>> exactly
>>>>>> you need to know.  Please give us the code you were trying to 
>>>>>> use, along
>>>>>> with an example that didn't behave the way you expected it to and 
>>>>>> you
>>>>>> the results of calling sessionInfo() after you did that. You can 
>>>>>> find
>>>>>> other helpful tips on the posting guide:
>>>>>>
>>>>>> http://www.bioconductor.org/docs/postingGuide.html
>>>>>>
>>>>>> What little I can discern from your post I will try to answer.  
>>>>>> To use
>>>>>> getSYMBOL() or lookUp(), you need to 1st of all make sure that 
>>>>>> you have
>>>>>> loaded the annotate package.  Then you need to call it 
>>>>>> correctly.  Here
>>>>>> is an example that I did using the very latest version of the
>>>>>> hugene10st.db package.
>>>>>>
>>>>>> library("annotate")
>>>>>> library("hugene10st.db")
>>>>>> keys = c("7903193","7903204")
>>>>>>
>>>>>> getSYMBOL(keys, "hugene10st")
>>>>>>
>>>>>> lookUp(keys, "hugene10st" , "CHR")
>>>>>> lookUp(keys, "hugene10st" , "ENTREZID")
>>>>>>
>>>>>>
>>>>>>
>>>>>> Hope this helps,
>>>>>>
>>>>>>
>>>>>>
>>>>>>   Marc
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Mayte Suarez-Farinas wrote:
>>>>>>        
>>>>>>             
>>>>>>> I am learning to work with the HuGene ST1 chips.
>>>>>>> I was able to use xps to read and preprocess the files
>>>>>>> and then I convert to ExpressionSet class to use limma
>>>>>>> for modelling.
>>>>>>> Next step I stop: the annotation.
>>>>>>> I load  library("hugene10st.db") but the normal functions
>>>>>>> to create html annotation does not seems to work on this chip.
>>>>>>> I also try to get each component using getSYMBOL and lookUP
>>>>>>> with no success.
>>>>>>> what's the way to go???
>>>>>>>
>>>>>>> Thanks
>>>>>>>
>>>>>>> Mayte
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>     [[alternative HTML version deleted]]
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Bioconductor mailing list
>>>>>>> Bioconductor at stat.math.ethz.ch
>>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>>>> Search the archives: http://news.gmane.org/
>>>>>>> gmane.science.biology.informatics.conductor
>>>>>>>
>>>>>>>
>>>>>>>           
>>>>>>>               
>>>>> _______________________________________________
>>>>> Bioconductor mailing list
>>>>> Bioconductor at stat.math.ethz.ch
>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>> Search the archives: http://news.gmane.org/
>>>>> gmane.science.biology.informatics.conductor
>>>>>       
>>>>>           
>>>> -- 
>>>> James W. MacDonald, M.S.
>>>> Biostatistician
>>>> Douglas Lab
>>>> University of Michigan
>>>> Department of Human Genetics
>>>> 5912 Buhl
>>>> 1241 E. Catherine St.
>>>> Ann Arbor MI 48109-5618
>>>> 734-615-7826
>>>>     
>>>>         
>>>     [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at stat.math.ethz.ch
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>>>
>>>
>>>       
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>>     
>
>
>
>



More information about the Bioconductor mailing list