[BioC] Human Gene ST 1.0 probeset controls
Benilton Carvalho
bcarvalh at jhsph.edu
Mon Nov 9 16:37:00 CET 2009
yes. everything, but the Prof part, is correct. ;)
b
On Nov 9, 2009, at 1:17 PM, Javier Pérez Florido wrote:
> Thanks Prof. Carvalho,
> Is the rest of my e-mail correct?
> Thanks,
> Javier
>
>
> Benilton Carvalho escribió:
>> XM_001714578 was replaced by NM_001136561.
>>
>> http://www.ncbi.nlm.nih.gov/nuccore/NM_001136561?log$=seqview_status
>>
>> b
>>
>> On Nov 9, 2009, at 9:19 AM, Javier Pérez Florido wrote:
>>
>>>
>>> OK,
>>> So, to sum up (and check if I understand the Human Gene ST 1.0
>>> array), when summarizing to the gene level means that there are
>>> several probesets that compose a gene. To summarize to the gene
>>> level
>>> when normalizing, I executed:
>>> OligoEset<-rma(OligoRaw,target="core")
>>> and I got 33297 genes (transcript ids).
>>>
>>> Using the following query on pd.hugene.1.0.st.v1:
>>> dbListTables(conn)
>>> dbListFields(conn,"type_dict")
>>> info2<-"SELECT * from type_dict"
>>> result<-dbGetQuery(conn,info2)
>>>
>>> I got:
>>> # type type_id
>>> #1 1 main
>>> #2 2 control->affx
>>> #3 3 control->chip
>>> #4 4 control->bgp->antigenomic
>>> #5 5 control->bgp->genomic
>>> #6 6 normgene->exon
>>> #7 7 normgene->intron
>>> #8 8 rescue->FLmRNA->unmapped
>>>
>>> I also executed the following query:
>>> conn<-db(pd.hugene.1.0.st.v1)
>>> dbListTables(conn)
>>> info = dbGetQuery(conn, paste("SELECT DISTINCT meta_fsetid as
>>> transcript_id, type_id",
>>> "FROM featureSet, core_mps, type_dict",
>>> "WHERE featureSet.fsetid=core_mps.fsetid",
>>> "AND featureSet.type=type_dict.type"))
>>>
>>> I have a complete processed example (it is summarized to the gene
>>> level, and it has the ACC number, Symbol information, etc for each
>>> transcript id). I wanted to reproduce the example by myself using
>>> the
>>> raw data. When matching the transcript_id field given by the above
>>> query and the transcript_id given by the example data set, the
>>> following information can be extracted:
>>> • control->affx are related to other-spike y AFFX probe sets (57
>>> probe sets)
>>> • normgene->exon are related to 1195 pos_control probe sets
>>> • normgene->intron are related to 2904 neg_control probe sets
>>> So, I suppose that there are about 4156 control transcripts.
>>>
>>> Since I summarized to the gene level, I have used the annotation
>>> file
>>> "hugene10sttranscriptcluster.db". I've tried to get the ACC number
>>> and the Symbol for some transcript_id. The idea was to check if the
>>> results given were the same as the example I have. For example:
>>>
>>> hugene10sttranscriptclusterACCNUM[["7912580"]]
>>> I get "NM_001136561", but in the example, the accession number is
>>> XM_001714578. However, I get the same result for Symbol:
>>> hugene10sttranscriptclusterSYMBOL[["7912580"]] : LOC440563
>>>
>>> Why are there different Accession Number for the same transcript_id?
>>>
>>> Thanks again,
>>> Javier
>>>
>>>> sessionInfo()
>>> R version 2.10.0 (2009-10-26)
>>> i386-pc-mingw32
>>>
>>> locale:
>>> [1] LC_COLLATE=Spanish_Spain.1252 LC_CTYPE=Spanish_Spain.1252
>>> [3] LC_MONETARY=Spanish_Spain.1252 LC_NUMERIC=C
>>> [5] LC_TIME=Spanish_Spain.1252
>>>
>>> attached base packages:
>>> [1] tools tcltk stats graphics grDevices utils
>>> datasets
>>> [8] methods base
>>>
>>> other attached packages:
>>> [1] pd.hugene.1.0.st.v1_3.0.0 oligoClasses_1.8.0
>>> [3] hugene10stprobeset.db_4.0.1
>>> hugene10sttranscriptcluster.db_4.0.1
>>> [5] org.Hs.eg.db_2.3.6 oneChannelGUI_1.12.0
>>> [7] preprocessCore_1.8.0 GOstats_2.12.0
>>> [9] RSQLite_0.7-3 DBI_0.2-4
>>> [11] graph_1.24.0 Category_2.12.0
>>> [13] AnnotationDbi_1.8.0 tkWidgets_1.24.0
>>> [15] DynDoc_1.24.0 widgetTools_1.24.0
>>> [17] affylmGUI_1.20.0 affyio_1.14.0
>>> [19] affy_1.24.0 limma_3.2.1
>>> [21] Biobase_2.6.0
>>>
>>> loaded via a namespace (and not attached):
>>> [1] annotate_1.24.0 Biostrings_2.14.0 genefilter_1.28.0
>>> GO.db_2.3.5
>>> [5] GSEABase_1.8.0 IRanges_1.4.0 RBGL_1.20.0
>>> splines_2.10.0
>>> [9] survival_2.35-7 XML_2.6-0 xtable_1.5-5
>>>>
>>>
>>>
>>> Benilton Carvalho escribió:
>>>>
>>>> Hi Javier,
>>>>
>>>> This is what you want to do:
>>>>
>>>> info = dbGetQuery(conn, paste("SELECT DISTINCT meta_fsetid as
>>>> transcript_id, type_id",
>>>> "FROM featureSet, core_mps, type_dict",
>>>> "WHERE featureSet.fsetid=core_mps.fsetid",
>>>> "AND featureSet.type=type_dict.type")
>>>>
>>>> I'll make sure that, in the next releases, the users are not
>>>> expected to figure out queries like this.
>>>>
>>>> Using a simplistic description: The probeset db is at the exon
>>>> level; Transcript db is at the gene level.
>>>>
>>>> b
>>>>
>>>> On Nov 5, 2009, at 8:59 AM, Javier Pérez Florido wrote:
>>>>
>>>>> Thanks to everybody,
>>>>> I'm new working on HuGene ST 1.0 and have some questions:
>>>>>
>>>>> * I have normalized some CEL files using the oligo package and
>>>>> the
>>>>> annotation file used, by default, is the pd.hugene.
>>>>> 1.0.st.v1. How
>>>>> can I access to this annotation file to check the type of
>>>>> control
>>>>> probe sets used? I've tried:
>>>>>
>>>>> conn<-db(pd.hugene.1.0.st.v1)
>>>>> dbListTables(conn)
>>>>> [1] "bgfeature" "chrom_dict" "core_mps" "featureSet"
>>>>> "level_dict"
>>>>> [6] "pmfeature" "table_info" "type_dict"
>>>>> dbListFields(conn,"featureSet")
>>>>> [1] "fsetid" "strand"
>>>>> "start" "stop"
>>>>> [5] "transcript_cluster_id" "exon_id"
>>>>> "crosshyb_type" "level"
>>>>> [9] "chrom" "type"
>>>>> sql="SELECT fsetid,type FROM featureSet"
>>>>> dbGetQuery(conn,sql)
>>>>> But I get integer numbers (1,2,3...) for the type field
>>>>> instead
>>>>> of "AFFX*", "other-spike", etc control probe sets using the
>>>>> annotation file....How can I get this information?
>>>>>
>>>>> * What is the difference between hugene10stprobeset.db and
>>>>> hugene10sttranscriptcluster.db? What is the diference between
>>>>> summarize at the probe set level and at the gene level?
>>>>>
>>>>> Thanks again,
>>>>> Javier
>>>>> P.S. If you know any document that could help me on this arrays,
>>>>> it
>>>>> would be great.
>>>>>
>>>>> R version 2.10.0 (2009-10-26)
>>>>> i386-pc-mingw32
>>>>>
>>>>> locale:
>>>>> [1] LC_COLLATE=Spanish_Spain.1252 LC_CTYPE=Spanish_Spain.1252
>>>>> LC_MONETARY=Spanish_Spain.1252
>>>>> [4] LC_NUMERIC=C LC_TIME=Spanish_Spain.1252
>>>>>
>>>>> attached base packages:
>>>>> [1] stats graphics grDevices utils datasets methods
>>>>> base
>>>>>
>>>>> other attached packages:
>>>>> [1] annotate_1.24.0 AnnotationDbi_1.8.0
>>>>> pd.hugene.1.0.st.v1_3.0.0 RSQLite_0.7-3
>>>>> [5] DBI_0.2-4 oligo_1.10.0
>>>>> preprocessCore_1.8.0 oligoClasses_1.8.0
>>>>> [9] Biobase_2.6.0
>>>>>
>>>>> loaded via a namespace (and not attached):
>>>>> [1] affxparser_1.18.0 affyio_1.14.0 Biostrings_2.14.0
>>>>> IRanges_1.4.0 splines_2.10.0 tools_2.10.0
>>>>> [7] xtable_1.5-5
>>>>>
>>>>>
>>>>> cstrato escribió:
>>>>>> Dear Javier,
>>>>>>
>>>>>> When you open the Affymetrix annotation files for the HuGene ST
>>>>>> 1.0
>>>>>> array you will see that it does contain 13 AFFX controls and a
>>>>>> numberof "other_spike" controls for both the transcript and the
>>>>>> probeset annotation files. The MoGene array contains 22
>>>>>> "control->affx" probesets including 13 AFFX controls (bac_spike,
>>>>>> polya_spike).
>>>>>>
>>>>>> Best regards
>>>>>> Christian
>>>>>> _._._._._._._._._._._._._._._._._._
>>>>>> C.h.r.i.s.t.i.a.n S.t.r.a.t.o.w.a
>>>>>> V.i.e.n.n.a A.u.s.t.r.i.a
>>>>>> e.m.a.i.l: cstrato at aon.at
>>>>>> _._._._._._._._._._._._._._._._._._
>>>>>>
>>>>>>
>>>>>> Javier Pérez Florido wrote:
>>>>>>> Dear list,
>>>>>>> I would like to know if the GeneChip Human Gene ST 1.0 array has
>>>>>>> some
>>>>>>> gene controls (like AFFX genes in other Affymetrix
>>>>>>> technologies).
>>>>>>> Thanks in advance,
>>>>>>> Javier
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Bioconductor mailing list
>>>>>>> Bioconductor at stat.math.ethz.ch
>>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>>>> Search the archives:
>>>>>>> http://news.gmane.org/
>>>>>>> gmane.science.biology.informatics.conductor
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> [[alternative HTML version deleted]]
>>>>>
>>>>> <ATT00001.txt>
>>>>
>>>>
>>>
>>
>>
>
More information about the Bioconductor
mailing list