[BioC] Human Gene ST 1.0 probeset controls

Benilton Carvalho bcarvalh at jhsph.edu
Mon Nov 9 16:37:00 CET 2009


yes. everything, but the Prof part, is correct. ;)
b

On Nov 9, 2009, at 1:17 PM, Javier Pérez Florido wrote:

> Thanks Prof. Carvalho,
> Is the rest of my e-mail correct?
> Thanks,
> Javier
>
>
> Benilton Carvalho escribió:
>> XM_001714578 was replaced by NM_001136561.
>>
>> http://www.ncbi.nlm.nih.gov/nuccore/NM_001136561?log$=seqview_status
>>
>> b
>>
>> On Nov 9, 2009, at 9:19 AM, Javier Pérez Florido wrote:
>>
>>>
>>> OK,
>>> So, to sum up (and check if I understand the Human Gene ST 1.0
>>> array), when summarizing to the gene level means that there are
>>> several probesets that compose a gene. To summarize to the gene  
>>> level
>>> when normalizing, I executed:
>>> OligoEset<-rma(OligoRaw,target="core")
>>> and I got 33297 genes (transcript ids).
>>>
>>> Using the following query on pd.hugene.1.0.st.v1:
>>> dbListTables(conn)
>>> dbListFields(conn,"type_dict")
>>> info2<-"SELECT * from type_dict"
>>> result<-dbGetQuery(conn,info2)
>>>
>>> I got:
>>> #  type                   type_id
>>> #1    1                      main
>>> #2    2             control->affx
>>> #3    3             control->chip
>>> #4    4 control->bgp->antigenomic
>>> #5    5     control->bgp->genomic
>>> #6    6            normgene->exon
>>> #7    7          normgene->intron
>>> #8    8  rescue->FLmRNA->unmapped
>>>
>>> I also executed the following query:
>>> conn<-db(pd.hugene.1.0.st.v1)
>>> dbListTables(conn)
>>> info = dbGetQuery(conn, paste("SELECT DISTINCT meta_fsetid as
>>> transcript_id, type_id",
>>>             "FROM featureSet, core_mps, type_dict",
>>>             "WHERE featureSet.fsetid=core_mps.fsetid",
>>>              "AND featureSet.type=type_dict.type"))
>>>
>>> I have a complete processed example (it is summarized to the gene
>>> level, and it has the ACC number, Symbol information, etc for each
>>> transcript id). I wanted to reproduce the example by myself using  
>>> the
>>> raw data. When matching the transcript_id field given by the above
>>> query and the transcript_id given by the example data set, the
>>> following information can be extracted:
>>>    • control->affx are related to other-spike y AFFX probe sets (57
>>> probe sets)
>>>    • normgene->exon are related to 1195 pos_control probe sets
>>>    • normgene->intron are related to 2904 neg_control probe sets
>>> So, I suppose that there are about 4156 control transcripts.
>>>
>>> Since I summarized to the gene level, I have used the annotation  
>>> file
>>> "hugene10sttranscriptcluster.db". I've tried to get the ACC number
>>> and the Symbol for some transcript_id. The idea was to check if the
>>> results given were the same as the example I have. For example:
>>>
>>> hugene10sttranscriptclusterACCNUM[["7912580"]]
>>> I get "NM_001136561", but in the example, the accession number is
>>> XM_001714578. However, I get the same result for Symbol:
>>> hugene10sttranscriptclusterSYMBOL[["7912580"]] :  LOC440563
>>>
>>> Why are there different Accession Number for the same transcript_id?
>>>
>>> Thanks again,
>>> Javier
>>>
>>>> sessionInfo()
>>> R version 2.10.0 (2009-10-26)
>>> i386-pc-mingw32
>>>
>>> locale:
>>> [1] LC_COLLATE=Spanish_Spain.1252  LC_CTYPE=Spanish_Spain.1252
>>> [3] LC_MONETARY=Spanish_Spain.1252 LC_NUMERIC=C
>>> [5] LC_TIME=Spanish_Spain.1252
>>>
>>> attached base packages:
>>> [1] tools     tcltk     stats     graphics  grDevices utils      
>>> datasets
>>> [8] methods   base
>>>
>>> other attached packages:
>>> [1] pd.hugene.1.0.st.v1_3.0.0            oligoClasses_1.8.0
>>> [3] hugene10stprobeset.db_4.0.1
>>> hugene10sttranscriptcluster.db_4.0.1
>>> [5] org.Hs.eg.db_2.3.6                   oneChannelGUI_1.12.0
>>> [7] preprocessCore_1.8.0                 GOstats_2.12.0
>>> [9] RSQLite_0.7-3                        DBI_0.2-4
>>> [11] graph_1.24.0                         Category_2.12.0
>>> [13] AnnotationDbi_1.8.0                  tkWidgets_1.24.0
>>> [15] DynDoc_1.24.0                        widgetTools_1.24.0
>>> [17] affylmGUI_1.20.0                     affyio_1.14.0
>>> [19] affy_1.24.0                          limma_3.2.1
>>> [21] Biobase_2.6.0
>>>
>>> loaded via a namespace (and not attached):
>>> [1] annotate_1.24.0   Biostrings_2.14.0 genefilter_1.28.0  
>>> GO.db_2.3.5
>>> [5] GSEABase_1.8.0    IRanges_1.4.0     RBGL_1.20.0
>>> splines_2.10.0
>>> [9] survival_2.35-7   XML_2.6-0         xtable_1.5-5
>>>>
>>>
>>>
>>> Benilton Carvalho escribió:
>>>>
>>>> Hi Javier,
>>>>
>>>> This is what you want to do:
>>>>
>>>> info = dbGetQuery(conn, paste("SELECT DISTINCT meta_fsetid as
>>>> transcript_id, type_id",
>>>>             "FROM featureSet, core_mps, type_dict",
>>>>             "WHERE featureSet.fsetid=core_mps.fsetid",
>>>>              "AND featureSet.type=type_dict.type")
>>>>
>>>> I'll make sure that, in the next releases, the users are not
>>>> expected to figure out queries like this.
>>>>
>>>> Using a simplistic description: The probeset db is at the exon
>>>> level; Transcript db is at the gene level.
>>>>
>>>> b
>>>>
>>>> On Nov 5, 2009, at 8:59 AM, Javier Pérez Florido wrote:
>>>>
>>>>> Thanks to everybody,
>>>>> I'm new working on HuGene ST 1.0 and have some questions:
>>>>>
>>>>>   * I have normalized some CEL files using the oligo package and  
>>>>> the
>>>>>     annotation file used, by default, is the pd.hugene. 
>>>>> 1.0.st.v1. How
>>>>>     can I access to this annotation file to check the type of  
>>>>> control
>>>>>     probe sets used? I've tried:
>>>>>
>>>>>       conn<-db(pd.hugene.1.0.st.v1)
>>>>>       dbListTables(conn)
>>>>>       [1] "bgfeature"  "chrom_dict" "core_mps"   "featureSet"
>>>>> "level_dict"
>>>>>       [6] "pmfeature"  "table_info" "type_dict"
>>>>>       dbListFields(conn,"featureSet")
>>>>>        [1] "fsetid"                "strand"
>>>>>       "start"                 "stop"
>>>>>        [5] "transcript_cluster_id" "exon_id"
>>>>>       "crosshyb_type"         "level"
>>>>>        [9] "chrom"                 "type"
>>>>>       sql="SELECT fsetid,type FROM featureSet"
>>>>>       dbGetQuery(conn,sql)
>>>>>       But I get integer numbers (1,2,3...) for the type field  
>>>>> instead
>>>>>       of "AFFX*", "other-spike", etc control probe sets using the
>>>>>       annotation file....How can I get this information?
>>>>>
>>>>>   * What is the difference between hugene10stprobeset.db and
>>>>>     hugene10sttranscriptcluster.db? What is the diference between
>>>>>     summarize at the probe set level and at the gene level?
>>>>>
>>>>> Thanks again,
>>>>> Javier
>>>>> P.S. If you know any document that could help me on this arrays,  
>>>>> it
>>>>> would be great.
>>>>>
>>>>> R version 2.10.0 (2009-10-26)
>>>>> i386-pc-mingw32
>>>>>
>>>>> locale:
>>>>> [1] LC_COLLATE=Spanish_Spain.1252  LC_CTYPE=Spanish_Spain.1252
>>>>> LC_MONETARY=Spanish_Spain.1252
>>>>> [4] LC_NUMERIC=C                   LC_TIME=Spanish_Spain.1252
>>>>>
>>>>> attached base packages:
>>>>> [1] stats     graphics  grDevices utils     datasets  methods    
>>>>> base
>>>>>
>>>>> other attached packages:
>>>>> [1] annotate_1.24.0           AnnotationDbi_1.8.0
>>>>> pd.hugene.1.0.st.v1_3.0.0 RSQLite_0.7-3
>>>>> [5] DBI_0.2-4                 oligo_1.10.0
>>>>> preprocessCore_1.8.0      oligoClasses_1.8.0
>>>>> [9] Biobase_2.6.0
>>>>>
>>>>> loaded via a namespace (and not attached):
>>>>> [1] affxparser_1.18.0 affyio_1.14.0     Biostrings_2.14.0
>>>>> IRanges_1.4.0     splines_2.10.0    tools_2.10.0
>>>>> [7] xtable_1.5-5
>>>>>
>>>>>
>>>>> cstrato escribió:
>>>>>> Dear Javier,
>>>>>>
>>>>>> When you open the Affymetrix annotation files for the HuGene ST  
>>>>>> 1.0
>>>>>> array you will see that it does contain 13 AFFX controls and a
>>>>>> numberof "other_spike" controls for both the transcript and the
>>>>>> probeset annotation files. The MoGene array contains 22
>>>>>> "control->affx" probesets including 13 AFFX controls (bac_spike,
>>>>>> polya_spike).
>>>>>>
>>>>>> Best regards
>>>>>> Christian
>>>>>> _._._._._._._._._._._._._._._._._._
>>>>>> C.h.r.i.s.t.i.a.n   S.t.r.a.t.o.w.a
>>>>>> V.i.e.n.n.a           A.u.s.t.r.i.a
>>>>>> e.m.a.i.l:        cstrato at aon.at
>>>>>> _._._._._._._._._._._._._._._._._._
>>>>>>
>>>>>>
>>>>>> Javier Pérez Florido wrote:
>>>>>>> Dear list,
>>>>>>> I would like to know if the GeneChip Human Gene ST 1.0 array has
>>>>>>> some
>>>>>>> gene controls (like AFFX genes in other Affymetrix  
>>>>>>> technologies).
>>>>>>> Thanks in advance,
>>>>>>> Javier
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Bioconductor mailing list
>>>>>>> Bioconductor at stat.math.ethz.ch
>>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>>>> Search the archives:
>>>>>>> http://news.gmane.org/ 
>>>>>>> gmane.science.biology.informatics.conductor
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>       [[alternative HTML version deleted]]
>>>>>
>>>>> <ATT00001.txt>
>>>>
>>>>
>>>
>>
>>
>



More information about the Bioconductor mailing list