[BioC] Human Gene ST 1.0 probeset controls
Benilton Carvalho
bcarvalh at jhsph.edu
Mon Nov 9 13:28:58 CET 2009
XM_001714578 was replaced by NM_001136561.
http://www.ncbi.nlm.nih.gov/nuccore/NM_001136561?log$=seqview_status
b
On Nov 9, 2009, at 9:19 AM, Javier Pérez Florido wrote:
>
> OK,
> So, to sum up (and check if I understand the Human Gene ST 1.0
> array), when summarizing to the gene level means that there are
> several probesets that compose a gene. To summarize to the gene
> level when normalizing, I executed:
> OligoEset<-rma(OligoRaw,target="core")
> and I got 33297 genes (transcript ids).
>
> Using the following query on pd.hugene.1.0.st.v1:
> dbListTables(conn)
> dbListFields(conn,"type_dict")
> info2<-"SELECT * from type_dict"
> result<-dbGetQuery(conn,info2)
>
> I got:
> # type type_id
> #1 1 main
> #2 2 control->affx
> #3 3 control->chip
> #4 4 control->bgp->antigenomic
> #5 5 control->bgp->genomic
> #6 6 normgene->exon
> #7 7 normgene->intron
> #8 8 rescue->FLmRNA->unmapped
>
> I also executed the following query:
> conn<-db(pd.hugene.1.0.st.v1)
> dbListTables(conn)
> info = dbGetQuery(conn, paste("SELECT DISTINCT meta_fsetid as
> transcript_id, type_id",
> "FROM featureSet, core_mps, type_dict",
> "WHERE featureSet.fsetid=core_mps.fsetid",
> "AND featureSet.type=type_dict.type"))
>
> I have a complete processed example (it is summarized to the gene
> level, and it has the ACC number, Symbol information, etc for each
> transcript id). I wanted to reproduce the example by myself using
> the raw data. When matching the transcript_id field given by the
> above query and the transcript_id given by the example data set, the
> following information can be extracted:
> • control->affx are related to other-spike y AFFX probe sets (57
> probe sets)
> • normgene->exon are related to 1195 pos_control probe sets
> • normgene->intron are related to 2904 neg_control probe sets
> So, I suppose that there are about 4156 control transcripts.
>
> Since I summarized to the gene level, I have used the annotation
> file "hugene10sttranscriptcluster.db". I've tried to get the ACC
> number and the Symbol for some transcript_id. The idea was to check
> if the results given were the same as the example I have. For example:
>
> hugene10sttranscriptclusterACCNUM[["7912580"]]
> I get "NM_001136561", but in the example, the accession number is
> XM_001714578. However, I get the same result for Symbol:
> hugene10sttranscriptclusterSYMBOL[["7912580"]] : LOC440563
>
> Why are there different Accession Number for the same transcript_id?
>
> Thanks again,
> Javier
>
> > sessionInfo()
> R version 2.10.0 (2009-10-26)
> i386-pc-mingw32
>
> locale:
> [1] LC_COLLATE=Spanish_Spain.1252 LC_CTYPE=Spanish_Spain.1252
> [3] LC_MONETARY=Spanish_Spain.1252 LC_NUMERIC=C
> [5] LC_TIME=Spanish_Spain.1252
>
> attached base packages:
> [1] tools tcltk stats graphics grDevices utils
> datasets
> [8] methods base
>
> other attached packages:
> [1] pd.hugene.1.0.st.v1_3.0.0 oligoClasses_1.8.0
> [3] hugene10stprobeset.db_4.0.1
> hugene10sttranscriptcluster.db_4.0.1
> [5] org.Hs.eg.db_2.3.6 oneChannelGUI_1.12.0
> [7] preprocessCore_1.8.0 GOstats_2.12.0
> [9] RSQLite_0.7-3 DBI_0.2-4
> [11] graph_1.24.0 Category_2.12.0
> [13] AnnotationDbi_1.8.0 tkWidgets_1.24.0
> [15] DynDoc_1.24.0 widgetTools_1.24.0
> [17] affylmGUI_1.20.0 affyio_1.14.0
> [19] affy_1.24.0 limma_3.2.1
> [21] Biobase_2.6.0
>
> loaded via a namespace (and not attached):
> [1] annotate_1.24.0 Biostrings_2.14.0 genefilter_1.28.0 GO.db_2.3.5
> [5] GSEABase_1.8.0 IRanges_1.4.0 RBGL_1.20.0
> splines_2.10.0
> [9] survival_2.35-7 XML_2.6-0 xtable_1.5-5
> >
>
>
> Benilton Carvalho escribió:
>>
>> Hi Javier,
>>
>> This is what you want to do:
>>
>> info = dbGetQuery(conn, paste("SELECT DISTINCT meta_fsetid as
>> transcript_id, type_id",
>> "FROM featureSet, core_mps, type_dict",
>> "WHERE featureSet.fsetid=core_mps.fsetid",
>> "AND featureSet.type=type_dict.type")
>>
>> I'll make sure that, in the next releases, the users are not
>> expected to figure out queries like this.
>>
>> Using a simplistic description: The probeset db is at the exon
>> level; Transcript db is at the gene level.
>>
>> b
>>
>> On Nov 5, 2009, at 8:59 AM, Javier Pérez Florido wrote:
>>
>>> Thanks to everybody,
>>> I'm new working on HuGene ST 1.0 and have some questions:
>>>
>>> * I have normalized some CEL files using the oligo package and
>>> the
>>> annotation file used, by default, is the pd.hugene.1.0.st.v1.
>>> How
>>> can I access to this annotation file to check the type of
>>> control
>>> probe sets used? I've tried:
>>>
>>> conn<-db(pd.hugene.1.0.st.v1)
>>> dbListTables(conn)
>>> [1] "bgfeature" "chrom_dict" "core_mps" "featureSet"
>>> "level_dict"
>>> [6] "pmfeature" "table_info" "type_dict"
>>> dbListFields(conn,"featureSet")
>>> [1] "fsetid" "strand"
>>> "start" "stop"
>>> [5] "transcript_cluster_id" "exon_id"
>>> "crosshyb_type" "level"
>>> [9] "chrom" "type"
>>> sql="SELECT fsetid,type FROM featureSet"
>>> dbGetQuery(conn,sql)
>>> But I get integer numbers (1,2,3...) for the type field
>>> instead
>>> of "AFFX*", "other-spike", etc control probe sets using the
>>> annotation file....How can I get this information?
>>>
>>> * What is the difference between hugene10stprobeset.db and
>>> hugene10sttranscriptcluster.db? What is the diference between
>>> summarize at the probe set level and at the gene level?
>>>
>>> Thanks again,
>>> Javier
>>> P.S. If you know any document that could help me on this arrays, it
>>> would be great.
>>>
>>> R version 2.10.0 (2009-10-26)
>>> i386-pc-mingw32
>>>
>>> locale:
>>> [1] LC_COLLATE=Spanish_Spain.1252 LC_CTYPE=Spanish_Spain.1252
>>> LC_MONETARY=Spanish_Spain.1252
>>> [4] LC_NUMERIC=C LC_TIME=Spanish_Spain.1252
>>>
>>> attached base packages:
>>> [1] stats graphics grDevices utils datasets methods base
>>>
>>> other attached packages:
>>> [1] annotate_1.24.0 AnnotationDbi_1.8.0
>>> pd.hugene.1.0.st.v1_3.0.0 RSQLite_0.7-3
>>> [5] DBI_0.2-4 oligo_1.10.0
>>> preprocessCore_1.8.0 oligoClasses_1.8.0
>>> [9] Biobase_2.6.0
>>>
>>> loaded via a namespace (and not attached):
>>> [1] affxparser_1.18.0 affyio_1.14.0 Biostrings_2.14.0
>>> IRanges_1.4.0 splines_2.10.0 tools_2.10.0
>>> [7] xtable_1.5-5
>>>
>>>
>>> cstrato escribió:
>>>> Dear Javier,
>>>>
>>>> When you open the Affymetrix annotation files for the HuGene ST 1.0
>>>> array you will see that it does contain 13 AFFX controls and a
>>>> numberof "other_spike" controls for both the transcript and the
>>>> probeset annotation files. The MoGene array contains 22
>>>> "control->affx" probesets including 13 AFFX controls (bac_spike,
>>>> polya_spike).
>>>>
>>>> Best regards
>>>> Christian
>>>> _._._._._._._._._._._._._._._._._._
>>>> C.h.r.i.s.t.i.a.n S.t.r.a.t.o.w.a
>>>> V.i.e.n.n.a A.u.s.t.r.i.a
>>>> e.m.a.i.l: cstrato at aon.at
>>>> _._._._._._._._._._._._._._._._._._
>>>>
>>>>
>>>> Javier Pérez Florido wrote:
>>>>> Dear list,
>>>>> I would like to know if the GeneChip Human Gene ST 1.0 array has
>>>>> some
>>>>> gene controls (like AFFX genes in other Affymetrix technologies).
>>>>> Thanks in advance,
>>>>> Javier
>>>>>
>>>>> _______________________________________________
>>>>> Bioconductor mailing list
>>>>> Bioconductor at stat.math.ethz.ch
>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>> Search the archives:
>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>>
>>>>
>>>>
>>>
>>>
>>> [[alternative HTML version deleted]]
>>>
>>> <ATT00001.txt>
>>
>>
>
More information about the Bioconductor
mailing list