[BioC] Human Gene ST 1.0 probeset controls

Benilton Carvalho bcarvalh at jhsph.edu
Mon Nov 9 13:28:58 CET 2009


XM_001714578 was replaced by NM_001136561.

http://www.ncbi.nlm.nih.gov/nuccore/NM_001136561?log$=seqview_status

b

On Nov 9, 2009, at 9:19 AM, Javier Pérez Florido wrote:

>
> OK,
> So, to sum up (and check if I understand the Human Gene ST 1.0  
> array), when summarizing to the gene level means that there are  
> several probesets that compose a gene. To summarize to the gene  
> level when normalizing, I executed:
> OligoEset<-rma(OligoRaw,target="core")
> and I got 33297 genes (transcript ids).
>
> Using the following query on pd.hugene.1.0.st.v1:
> dbListTables(conn)
> dbListFields(conn,"type_dict")
> info2<-"SELECT * from type_dict"
> result<-dbGetQuery(conn,info2)
>
> I got:
> #  type                   type_id
> #1    1                      main
> #2    2             control->affx
> #3    3             control->chip
> #4    4 control->bgp->antigenomic
> #5    5     control->bgp->genomic
> #6    6            normgene->exon
> #7    7          normgene->intron
> #8    8  rescue->FLmRNA->unmapped
>
> I also executed the following query:
> conn<-db(pd.hugene.1.0.st.v1)
> dbListTables(conn)
> info = dbGetQuery(conn, paste("SELECT DISTINCT meta_fsetid as  
> transcript_id, type_id",
>              "FROM featureSet, core_mps, type_dict",
>              "WHERE featureSet.fsetid=core_mps.fsetid",
>               "AND featureSet.type=type_dict.type"))
>
> I have a complete processed example (it is summarized to the gene  
> level, and it has the ACC number, Symbol information, etc for each  
> transcript id). I wanted to reproduce the example by myself using  
> the raw data. When matching the transcript_id field given by the  
> above query and the transcript_id given by the example data set, the  
> following information can be extracted:
> 	• control->affx are related to other-spike y AFFX probe sets (57  
> probe sets)
> 	• normgene->exon are related to 1195 pos_control probe sets
> 	• normgene->intron are related to 2904 neg_control probe sets
> So, I suppose that there are about 4156 control transcripts.
>
> Since I summarized to the gene level, I have used the annotation  
> file "hugene10sttranscriptcluster.db". I've tried to get the ACC  
> number and the Symbol for some transcript_id. The idea was to check  
> if the results given were the same as the example I have. For example:
>
> hugene10sttranscriptclusterACCNUM[["7912580"]]
> I get "NM_001136561", but in the example, the accession number is  
> XM_001714578. However, I get the same result for Symbol:
> hugene10sttranscriptclusterSYMBOL[["7912580"]] :  LOC440563
>
> Why are there different Accession Number for the same transcript_id?
>
> Thanks again,
> Javier
>
> > sessionInfo()
> R version 2.10.0 (2009-10-26)
> i386-pc-mingw32
>
> locale:
> [1] LC_COLLATE=Spanish_Spain.1252  LC_CTYPE=Spanish_Spain.1252
> [3] LC_MONETARY=Spanish_Spain.1252 LC_NUMERIC=C
> [5] LC_TIME=Spanish_Spain.1252
>
> attached base packages:
> [1] tools     tcltk     stats     graphics  grDevices utils      
> datasets
> [8] methods   base
>
> other attached packages:
>  [1] pd.hugene.1.0.st.v1_3.0.0            oligoClasses_1.8.0
>  [3] hugene10stprobeset.db_4.0.1           
> hugene10sttranscriptcluster.db_4.0.1
>  [5] org.Hs.eg.db_2.3.6                   oneChannelGUI_1.12.0
>  [7] preprocessCore_1.8.0                 GOstats_2.12.0
>  [9] RSQLite_0.7-3                        DBI_0.2-4
> [11] graph_1.24.0                         Category_2.12.0
> [13] AnnotationDbi_1.8.0                  tkWidgets_1.24.0
> [15] DynDoc_1.24.0                        widgetTools_1.24.0
> [17] affylmGUI_1.20.0                     affyio_1.14.0
> [19] affy_1.24.0                          limma_3.2.1
> [21] Biobase_2.6.0
>
> loaded via a namespace (and not attached):
>  [1] annotate_1.24.0   Biostrings_2.14.0 genefilter_1.28.0 GO.db_2.3.5
>  [5] GSEABase_1.8.0    IRanges_1.4.0     RBGL_1.20.0        
> splines_2.10.0
>  [9] survival_2.35-7   XML_2.6-0         xtable_1.5-5
> >
>
>
> Benilton Carvalho escribió:
>>
>> Hi Javier,
>>
>> This is what you want to do:
>>
>> info = dbGetQuery(conn, paste("SELECT DISTINCT meta_fsetid as  
>> transcript_id, type_id",
>>              "FROM featureSet, core_mps, type_dict",
>>              "WHERE featureSet.fsetid=core_mps.fsetid",
>>               "AND featureSet.type=type_dict.type")
>>
>> I'll make sure that, in the next releases, the users are not  
>> expected to figure out queries like this.
>>
>> Using a simplistic description: The probeset db is at the exon  
>> level; Transcript db is at the gene level.
>>
>> b
>>
>> On Nov 5, 2009, at 8:59 AM, Javier Pérez Florido wrote:
>>
>>> Thanks to everybody,
>>> I'm new working on HuGene ST 1.0 and have some questions:
>>>
>>>    * I have normalized some CEL files using the oligo package and  
>>> the
>>>      annotation file used, by default, is the pd.hugene.1.0.st.v1.  
>>> How
>>>      can I access to this annotation file to check the type of  
>>> control
>>>      probe sets used? I've tried:
>>>
>>>        conn<-db(pd.hugene.1.0.st.v1)
>>>        dbListTables(conn)
>>>        [1] "bgfeature"  "chrom_dict" "core_mps"   "featureSet"  
>>> "level_dict"
>>>        [6] "pmfeature"  "table_info" "type_dict"
>>>        dbListFields(conn,"featureSet")
>>>         [1] "fsetid"                "strand"
>>>        "start"                 "stop"
>>>         [5] "transcript_cluster_id" "exon_id"
>>>        "crosshyb_type"         "level"
>>>         [9] "chrom"                 "type"
>>>        sql="SELECT fsetid,type FROM featureSet"
>>>        dbGetQuery(conn,sql)
>>>        But I get integer numbers (1,2,3...) for the type field  
>>> instead
>>>        of "AFFX*", "other-spike", etc control probe sets using the
>>>        annotation file....How can I get this information?
>>>
>>>    * What is the difference between hugene10stprobeset.db and
>>>      hugene10sttranscriptcluster.db? What is the diference between
>>>      summarize at the probe set level and at the gene level?
>>>
>>> Thanks again,
>>> Javier
>>> P.S. If you know any document that could help me on this arrays, it
>>> would be great.
>>>
>>> R version 2.10.0 (2009-10-26)
>>> i386-pc-mingw32
>>>
>>> locale:
>>> [1] LC_COLLATE=Spanish_Spain.1252  LC_CTYPE=Spanish_Spain.1252
>>> LC_MONETARY=Spanish_Spain.1252
>>> [4] LC_NUMERIC=C                   LC_TIME=Spanish_Spain.1252
>>>
>>> attached base packages:
>>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>>
>>> other attached packages:
>>> [1] annotate_1.24.0           AnnotationDbi_1.8.0
>>> pd.hugene.1.0.st.v1_3.0.0 RSQLite_0.7-3
>>> [5] DBI_0.2-4                 oligo_1.10.0
>>> preprocessCore_1.8.0      oligoClasses_1.8.0
>>> [9] Biobase_2.6.0
>>>
>>> loaded via a namespace (and not attached):
>>> [1] affxparser_1.18.0 affyio_1.14.0     Biostrings_2.14.0
>>> IRanges_1.4.0     splines_2.10.0    tools_2.10.0
>>> [7] xtable_1.5-5
>>>
>>>
>>> cstrato escribió:
>>>> Dear Javier,
>>>>
>>>> When you open the Affymetrix annotation files for the HuGene ST 1.0
>>>> array you will see that it does contain 13 AFFX controls and a
>>>> numberof "other_spike" controls for both the transcript and the
>>>> probeset annotation files. The MoGene array contains 22
>>>> "control->affx" probesets including 13 AFFX controls (bac_spike,
>>>> polya_spike).
>>>>
>>>> Best regards
>>>> Christian
>>>> _._._._._._._._._._._._._._._._._._
>>>> C.h.r.i.s.t.i.a.n   S.t.r.a.t.o.w.a
>>>> V.i.e.n.n.a           A.u.s.t.r.i.a
>>>> e.m.a.i.l:        cstrato at aon.at
>>>> _._._._._._._._._._._._._._._._._._
>>>>
>>>>
>>>> Javier Pérez Florido wrote:
>>>>> Dear list,
>>>>> I would like to know if the GeneChip Human Gene ST 1.0 array has  
>>>>> some
>>>>> gene controls (like AFFX genes in other Affymetrix technologies).
>>>>> Thanks in advance,
>>>>> Javier
>>>>>
>>>>> _______________________________________________
>>>>> Bioconductor mailing list
>>>>> Bioconductor at stat.math.ethz.ch
>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>> Search the archives:
>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>>
>>>>
>>>>
>>>
>>>
>>>        [[alternative HTML version deleted]]
>>>
>>> <ATT00001.txt>
>>
>>
>



More information about the Bioconductor mailing list