[BioC] Human Gene ST 1.0 probeset controls
Javier Pérez Florido
jpflorido at gmail.com
Mon Nov 9 16:17:09 CET 2009
Thanks Prof. Carvalho,
Is the rest of my e-mail correct?
Thanks,
Javier
Benilton Carvalho escribió:
> XM_001714578 was replaced by NM_001136561.
>
> http://www.ncbi.nlm.nih.gov/nuccore/NM_001136561?log$=seqview_status
>
> b
>
> On Nov 9, 2009, at 9:19 AM, Javier Pérez Florido wrote:
>
>>
>> OK,
>> So, to sum up (and check if I understand the Human Gene ST 1.0
>> array), when summarizing to the gene level means that there are
>> several probesets that compose a gene. To summarize to the gene level
>> when normalizing, I executed:
>> OligoEset<-rma(OligoRaw,target="core")
>> and I got 33297 genes (transcript ids).
>>
>> Using the following query on pd.hugene.1.0.st.v1:
>> dbListTables(conn)
>> dbListFields(conn,"type_dict")
>> info2<-"SELECT * from type_dict"
>> result<-dbGetQuery(conn,info2)
>>
>> I got:
>> # type type_id
>> #1 1 main
>> #2 2 control->affx
>> #3 3 control->chip
>> #4 4 control->bgp->antigenomic
>> #5 5 control->bgp->genomic
>> #6 6 normgene->exon
>> #7 7 normgene->intron
>> #8 8 rescue->FLmRNA->unmapped
>>
>> I also executed the following query:
>> conn<-db(pd.hugene.1.0.st.v1)
>> dbListTables(conn)
>> info = dbGetQuery(conn, paste("SELECT DISTINCT meta_fsetid as
>> transcript_id, type_id",
>> "FROM featureSet, core_mps, type_dict",
>> "WHERE featureSet.fsetid=core_mps.fsetid",
>> "AND featureSet.type=type_dict.type"))
>>
>> I have a complete processed example (it is summarized to the gene
>> level, and it has the ACC number, Symbol information, etc for each
>> transcript id). I wanted to reproduce the example by myself using the
>> raw data. When matching the transcript_id field given by the above
>> query and the transcript_id given by the example data set, the
>> following information can be extracted:
>> • control->affx are related to other-spike y AFFX probe sets (57
>> probe sets)
>> • normgene->exon are related to 1195 pos_control probe sets
>> • normgene->intron are related to 2904 neg_control probe sets
>> So, I suppose that there are about 4156 control transcripts.
>>
>> Since I summarized to the gene level, I have used the annotation file
>> "hugene10sttranscriptcluster.db". I've tried to get the ACC number
>> and the Symbol for some transcript_id. The idea was to check if the
>> results given were the same as the example I have. For example:
>>
>> hugene10sttranscriptclusterACCNUM[["7912580"]]
>> I get "NM_001136561", but in the example, the accession number is
>> XM_001714578. However, I get the same result for Symbol:
>> hugene10sttranscriptclusterSYMBOL[["7912580"]] : LOC440563
>>
>> Why are there different Accession Number for the same transcript_id?
>>
>> Thanks again,
>> Javier
>>
>> > sessionInfo()
>> R version 2.10.0 (2009-10-26)
>> i386-pc-mingw32
>>
>> locale:
>> [1] LC_COLLATE=Spanish_Spain.1252 LC_CTYPE=Spanish_Spain.1252
>> [3] LC_MONETARY=Spanish_Spain.1252 LC_NUMERIC=C
>> [5] LC_TIME=Spanish_Spain.1252
>>
>> attached base packages:
>> [1] tools tcltk stats graphics grDevices utils datasets
>> [8] methods base
>>
>> other attached packages:
>> [1] pd.hugene.1.0.st.v1_3.0.0 oligoClasses_1.8.0
>> [3] hugene10stprobeset.db_4.0.1
>> hugene10sttranscriptcluster.db_4.0.1
>> [5] org.Hs.eg.db_2.3.6 oneChannelGUI_1.12.0
>> [7] preprocessCore_1.8.0 GOstats_2.12.0
>> [9] RSQLite_0.7-3 DBI_0.2-4
>> [11] graph_1.24.0 Category_2.12.0
>> [13] AnnotationDbi_1.8.0 tkWidgets_1.24.0
>> [15] DynDoc_1.24.0 widgetTools_1.24.0
>> [17] affylmGUI_1.20.0 affyio_1.14.0
>> [19] affy_1.24.0 limma_3.2.1
>> [21] Biobase_2.6.0
>>
>> loaded via a namespace (and not attached):
>> [1] annotate_1.24.0 Biostrings_2.14.0 genefilter_1.28.0 GO.db_2.3.5
>> [5] GSEABase_1.8.0 IRanges_1.4.0 RBGL_1.20.0
>> splines_2.10.0
>> [9] survival_2.35-7 XML_2.6-0 xtable_1.5-5
>> >
>>
>>
>> Benilton Carvalho escribió:
>>>
>>> Hi Javier,
>>>
>>> This is what you want to do:
>>>
>>> info = dbGetQuery(conn, paste("SELECT DISTINCT meta_fsetid as
>>> transcript_id, type_id",
>>> "FROM featureSet, core_mps, type_dict",
>>> "WHERE featureSet.fsetid=core_mps.fsetid",
>>> "AND featureSet.type=type_dict.type")
>>>
>>> I'll make sure that, in the next releases, the users are not
>>> expected to figure out queries like this.
>>>
>>> Using a simplistic description: The probeset db is at the exon
>>> level; Transcript db is at the gene level.
>>>
>>> b
>>>
>>> On Nov 5, 2009, at 8:59 AM, Javier Pérez Florido wrote:
>>>
>>>> Thanks to everybody,
>>>> I'm new working on HuGene ST 1.0 and have some questions:
>>>>
>>>> * I have normalized some CEL files using the oligo package and the
>>>> annotation file used, by default, is the pd.hugene.1.0.st.v1. How
>>>> can I access to this annotation file to check the type of control
>>>> probe sets used? I've tried:
>>>>
>>>> conn<-db(pd.hugene.1.0.st.v1)
>>>> dbListTables(conn)
>>>> [1] "bgfeature" "chrom_dict" "core_mps" "featureSet"
>>>> "level_dict"
>>>> [6] "pmfeature" "table_info" "type_dict"
>>>> dbListFields(conn,"featureSet")
>>>> [1] "fsetid" "strand"
>>>> "start" "stop"
>>>> [5] "transcript_cluster_id" "exon_id"
>>>> "crosshyb_type" "level"
>>>> [9] "chrom" "type"
>>>> sql="SELECT fsetid,type FROM featureSet"
>>>> dbGetQuery(conn,sql)
>>>> But I get integer numbers (1,2,3...) for the type field instead
>>>> of "AFFX*", "other-spike", etc control probe sets using the
>>>> annotation file....How can I get this information?
>>>>
>>>> * What is the difference between hugene10stprobeset.db and
>>>> hugene10sttranscriptcluster.db? What is the diference between
>>>> summarize at the probe set level and at the gene level?
>>>>
>>>> Thanks again,
>>>> Javier
>>>> P.S. If you know any document that could help me on this arrays, it
>>>> would be great.
>>>>
>>>> R version 2.10.0 (2009-10-26)
>>>> i386-pc-mingw32
>>>>
>>>> locale:
>>>> [1] LC_COLLATE=Spanish_Spain.1252 LC_CTYPE=Spanish_Spain.1252
>>>> LC_MONETARY=Spanish_Spain.1252
>>>> [4] LC_NUMERIC=C LC_TIME=Spanish_Spain.1252
>>>>
>>>> attached base packages:
>>>> [1] stats graphics grDevices utils datasets methods base
>>>>
>>>> other attached packages:
>>>> [1] annotate_1.24.0 AnnotationDbi_1.8.0
>>>> pd.hugene.1.0.st.v1_3.0.0 RSQLite_0.7-3
>>>> [5] DBI_0.2-4 oligo_1.10.0
>>>> preprocessCore_1.8.0 oligoClasses_1.8.0
>>>> [9] Biobase_2.6.0
>>>>
>>>> loaded via a namespace (and not attached):
>>>> [1] affxparser_1.18.0 affyio_1.14.0 Biostrings_2.14.0
>>>> IRanges_1.4.0 splines_2.10.0 tools_2.10.0
>>>> [7] xtable_1.5-5
>>>>
>>>>
>>>> cstrato escribió:
>>>>> Dear Javier,
>>>>>
>>>>> When you open the Affymetrix annotation files for the HuGene ST 1.0
>>>>> array you will see that it does contain 13 AFFX controls and a
>>>>> numberof "other_spike" controls for both the transcript and the
>>>>> probeset annotation files. The MoGene array contains 22
>>>>> "control->affx" probesets including 13 AFFX controls (bac_spike,
>>>>> polya_spike).
>>>>>
>>>>> Best regards
>>>>> Christian
>>>>> _._._._._._._._._._._._._._._._._._
>>>>> C.h.r.i.s.t.i.a.n S.t.r.a.t.o.w.a
>>>>> V.i.e.n.n.a A.u.s.t.r.i.a
>>>>> e.m.a.i.l: cstrato at aon.at
>>>>> _._._._._._._._._._._._._._._._._._
>>>>>
>>>>>
>>>>> Javier Pérez Florido wrote:
>>>>>> Dear list,
>>>>>> I would like to know if the GeneChip Human Gene ST 1.0 array has
>>>>>> some
>>>>>> gene controls (like AFFX genes in other Affymetrix technologies).
>>>>>> Thanks in advance,
>>>>>> Javier
>>>>>>
>>>>>> _______________________________________________
>>>>>> Bioconductor mailing list
>>>>>> Bioconductor at stat.math.ethz.ch
>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>>> Search the archives:
>>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> [[alternative HTML version deleted]]
>>>>
>>>> <ATT00001.txt>
>>>
>>>
>>
>
>
More information about the Bioconductor
mailing list