[BioC] Human Gene ST 1.0 probeset controls

Javier Pérez Florido jpflorido at gmail.com
Mon Nov 9 16:17:09 CET 2009


Thanks Prof. Carvalho,
Is the rest of my e-mail correct?
Thanks,
Javier


Benilton Carvalho escribió:
> XM_001714578 was replaced by NM_001136561.
>
> http://www.ncbi.nlm.nih.gov/nuccore/NM_001136561?log$=seqview_status
>
> b
>
> On Nov 9, 2009, at 9:19 AM, Javier Pérez Florido wrote:
>
>>
>> OK,
>> So, to sum up (and check if I understand the Human Gene ST 1.0 
>> array), when summarizing to the gene level means that there are 
>> several probesets that compose a gene. To summarize to the gene level 
>> when normalizing, I executed:
>> OligoEset<-rma(OligoRaw,target="core")
>> and I got 33297 genes (transcript ids).
>>
>> Using the following query on pd.hugene.1.0.st.v1:
>> dbListTables(conn)
>> dbListFields(conn,"type_dict")
>> info2<-"SELECT * from type_dict"
>> result<-dbGetQuery(conn,info2)
>>
>> I got:
>> #  type                   type_id
>> #1    1                      main
>> #2    2             control->affx
>> #3    3             control->chip
>> #4    4 control->bgp->antigenomic
>> #5    5     control->bgp->genomic
>> #6    6            normgene->exon
>> #7    7          normgene->intron
>> #8    8  rescue->FLmRNA->unmapped
>>
>> I also executed the following query:
>> conn<-db(pd.hugene.1.0.st.v1)
>> dbListTables(conn)
>> info = dbGetQuery(conn, paste("SELECT DISTINCT meta_fsetid as 
>> transcript_id, type_id",
>>              "FROM featureSet, core_mps, type_dict",
>>              "WHERE featureSet.fsetid=core_mps.fsetid",
>>               "AND featureSet.type=type_dict.type"))
>>
>> I have a complete processed example (it is summarized to the gene 
>> level, and it has the ACC number, Symbol information, etc for each 
>> transcript id). I wanted to reproduce the example by myself using the 
>> raw data. When matching the transcript_id field given by the above 
>> query and the transcript_id given by the example data set, the 
>> following information can be extracted:
>>     • control->affx are related to other-spike y AFFX probe sets (57 
>> probe sets)
>>     • normgene->exon are related to 1195 pos_control probe sets
>>     • normgene->intron are related to 2904 neg_control probe sets
>> So, I suppose that there are about 4156 control transcripts.
>>
>> Since I summarized to the gene level, I have used the annotation file 
>> "hugene10sttranscriptcluster.db". I've tried to get the ACC number 
>> and the Symbol for some transcript_id. The idea was to check if the 
>> results given were the same as the example I have. For example:
>>
>> hugene10sttranscriptclusterACCNUM[["7912580"]]
>> I get "NM_001136561", but in the example, the accession number is 
>> XM_001714578. However, I get the same result for Symbol:
>> hugene10sttranscriptclusterSYMBOL[["7912580"]] :  LOC440563
>>
>> Why are there different Accession Number for the same transcript_id?
>>
>> Thanks again,
>> Javier
>>
>> > sessionInfo()
>> R version 2.10.0 (2009-10-26)
>> i386-pc-mingw32
>>
>> locale:
>> [1] LC_COLLATE=Spanish_Spain.1252  LC_CTYPE=Spanish_Spain.1252
>> [3] LC_MONETARY=Spanish_Spain.1252 LC_NUMERIC=C
>> [5] LC_TIME=Spanish_Spain.1252
>>
>> attached base packages:
>> [1] tools     tcltk     stats     graphics  grDevices utils     datasets
>> [8] methods   base
>>
>> other attached packages:
>>  [1] pd.hugene.1.0.st.v1_3.0.0            oligoClasses_1.8.0
>>  [3] hugene10stprobeset.db_4.0.1          
>> hugene10sttranscriptcluster.db_4.0.1
>>  [5] org.Hs.eg.db_2.3.6                   oneChannelGUI_1.12.0
>>  [7] preprocessCore_1.8.0                 GOstats_2.12.0
>>  [9] RSQLite_0.7-3                        DBI_0.2-4
>> [11] graph_1.24.0                         Category_2.12.0
>> [13] AnnotationDbi_1.8.0                  tkWidgets_1.24.0
>> [15] DynDoc_1.24.0                        widgetTools_1.24.0
>> [17] affylmGUI_1.20.0                     affyio_1.14.0
>> [19] affy_1.24.0                          limma_3.2.1
>> [21] Biobase_2.6.0
>>
>> loaded via a namespace (and not attached):
>>  [1] annotate_1.24.0   Biostrings_2.14.0 genefilter_1.28.0 GO.db_2.3.5
>>  [5] GSEABase_1.8.0    IRanges_1.4.0     RBGL_1.20.0       
>> splines_2.10.0
>>  [9] survival_2.35-7   XML_2.6-0         xtable_1.5-5
>> >
>>
>>
>> Benilton Carvalho escribió:
>>>
>>> Hi Javier,
>>>
>>> This is what you want to do:
>>>
>>> info = dbGetQuery(conn, paste("SELECT DISTINCT meta_fsetid as 
>>> transcript_id, type_id",
>>>              "FROM featureSet, core_mps, type_dict",
>>>              "WHERE featureSet.fsetid=core_mps.fsetid",
>>>               "AND featureSet.type=type_dict.type")
>>>
>>> I'll make sure that, in the next releases, the users are not 
>>> expected to figure out queries like this.
>>>
>>> Using a simplistic description: The probeset db is at the exon 
>>> level; Transcript db is at the gene level.
>>>
>>> b
>>>
>>> On Nov 5, 2009, at 8:59 AM, Javier Pérez Florido wrote:
>>>
>>>> Thanks to everybody,
>>>> I'm new working on HuGene ST 1.0 and have some questions:
>>>>
>>>>    * I have normalized some CEL files using the oligo package and the
>>>>      annotation file used, by default, is the pd.hugene.1.0.st.v1. How
>>>>      can I access to this annotation file to check the type of control
>>>>      probe sets used? I've tried:
>>>>
>>>>        conn<-db(pd.hugene.1.0.st.v1)
>>>>        dbListTables(conn)
>>>>        [1] "bgfeature"  "chrom_dict" "core_mps"   "featureSet" 
>>>> "level_dict"
>>>>        [6] "pmfeature"  "table_info" "type_dict"
>>>>        dbListFields(conn,"featureSet")
>>>>         [1] "fsetid"                "strand"
>>>>        "start"                 "stop"
>>>>         [5] "transcript_cluster_id" "exon_id"
>>>>        "crosshyb_type"         "level"
>>>>         [9] "chrom"                 "type"
>>>>        sql="SELECT fsetid,type FROM featureSet"
>>>>        dbGetQuery(conn,sql)
>>>>        But I get integer numbers (1,2,3...) for the type field instead
>>>>        of "AFFX*", "other-spike", etc control probe sets using the
>>>>        annotation file....How can I get this information?
>>>>
>>>>    * What is the difference between hugene10stprobeset.db and
>>>>      hugene10sttranscriptcluster.db? What is the diference between
>>>>      summarize at the probe set level and at the gene level?
>>>>
>>>> Thanks again,
>>>> Javier
>>>> P.S. If you know any document that could help me on this arrays, it
>>>> would be great.
>>>>
>>>> R version 2.10.0 (2009-10-26)
>>>> i386-pc-mingw32
>>>>
>>>> locale:
>>>> [1] LC_COLLATE=Spanish_Spain.1252  LC_CTYPE=Spanish_Spain.1252
>>>> LC_MONETARY=Spanish_Spain.1252
>>>> [4] LC_NUMERIC=C                   LC_TIME=Spanish_Spain.1252
>>>>
>>>> attached base packages:
>>>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>>>
>>>> other attached packages:
>>>> [1] annotate_1.24.0           AnnotationDbi_1.8.0
>>>> pd.hugene.1.0.st.v1_3.0.0 RSQLite_0.7-3
>>>> [5] DBI_0.2-4                 oligo_1.10.0
>>>> preprocessCore_1.8.0      oligoClasses_1.8.0
>>>> [9] Biobase_2.6.0
>>>>
>>>> loaded via a namespace (and not attached):
>>>> [1] affxparser_1.18.0 affyio_1.14.0     Biostrings_2.14.0
>>>> IRanges_1.4.0     splines_2.10.0    tools_2.10.0
>>>> [7] xtable_1.5-5
>>>>
>>>>
>>>> cstrato escribió:
>>>>> Dear Javier,
>>>>>
>>>>> When you open the Affymetrix annotation files for the HuGene ST 1.0
>>>>> array you will see that it does contain 13 AFFX controls and a
>>>>> numberof "other_spike" controls for both the transcript and the
>>>>> probeset annotation files. The MoGene array contains 22
>>>>> "control->affx" probesets including 13 AFFX controls (bac_spike,
>>>>> polya_spike).
>>>>>
>>>>> Best regards
>>>>> Christian
>>>>> _._._._._._._._._._._._._._._._._._
>>>>> C.h.r.i.s.t.i.a.n   S.t.r.a.t.o.w.a
>>>>> V.i.e.n.n.a           A.u.s.t.r.i.a
>>>>> e.m.a.i.l:        cstrato at aon.at
>>>>> _._._._._._._._._._._._._._._._._._
>>>>>
>>>>>
>>>>> Javier Pérez Florido wrote:
>>>>>> Dear list,
>>>>>> I would like to know if the GeneChip Human Gene ST 1.0 array has 
>>>>>> some
>>>>>> gene controls (like AFFX genes in other Affymetrix technologies).
>>>>>> Thanks in advance,
>>>>>> Javier
>>>>>>
>>>>>> _______________________________________________
>>>>>> Bioconductor mailing list
>>>>>> Bioconductor at stat.math.ethz.ch
>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>>> Search the archives:
>>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>        [[alternative HTML version deleted]]
>>>>
>>>> <ATT00001.txt>
>>>
>>>
>>
>
>



More information about the Bioconductor mailing list