[BioC] two questions regarding Human Gene 1.0 ST arrays

Tue Apr 26 20:42:46 CEST 2011

Dear Javier,

I would suggest to look at the Affymetrix annotation file 
"HuGene-1_0-st-v1.na31.hg19.probeset.csv".

There you can see the following for e.g. SAMD11:
transcript_cluster_id = 7896761
number of probesets for 7896761 is 17
number of probes per probeset is 2 (with 2 exceptions)

Then you can open "HuEx-1_0-st-v2.na31.hg19.probeset.csv" and compare 
the data for yourself.

Best regards
Christian

On 4/26/11 11:13 AM, Javier Pérez Florido wrote:
> Dear Christian,
> Thanks for your reply, but, is it right to assert that, for Gene arrays,
> probesets target a particular exon of a particular gene and transcript
> cluster enables gene-level estimates as Exon arrays, but using less
> probes per exon?
>
> Thanks,
> Javier
>
>
> On 25/04/2011 21:13, cstrato wrote:
>> While Exon ST arrays have usually 4 probes per probeset, Gene ST
>> arrays have only 1-2 probes per probeset. Thus my personal opinion is
>> not to use Gene ST arrays to detect alternative splicing events.
>>
>> However, there exists e.g. FIRMAGene for this purpose, see:
>> http://bioinf.wehi.edu.au/folders/firmagene/
>>
>> Best regards
>> Christian
>>
>>
>> On 4/25/11 8:54 PM, Javier Pérez Florido wrote:
>>> Sorry, I always forget sessionInfo(), see below
>>>
>>> You are right, for Human Gene ST arrays and at transcript level, only
>>> "core" mode exists. However, when:
>>> fit<-fitPLM(OligoRaw)
>>> where OligoRaw is the set of Raw data, the size of "fit" object is
>>> 257,430 and when the following command is executed
>>>
>>> OligoEset<-rma(OligoRaw,target="probeset")
>>>
>>> OligoEset has 257,430 features. So, the RMA procedure "inside" fitPLM
>>> function performs a normalization at the probeset level.
>>>
>>> On the other hand, summarization using RMA can be performed at the
>>> transcript level in the following way:
>>> OligoEset<-rma(OligoRaw,target="core")
>>>
>>> which yields around 33000 transcripts.
>>>
>>> I'm still confused about the concepts of "probeset" and "transcript" on
>>> Human Gene Arrays.
>>>
>>> For Exon arrays, probesets consists of four individual probes and
>>> usually target a particular exon of a particular gene. Thus exon-level
>>> intensity estimates correspond to the probeset-level estimates.
>>> Probesets are further grouped into transcript clusters enabling
>>> gene-level estimate to be computed by summarizing data from all probes
>>> within the transcript cluster.
>>>
>>> However, I don't know if I can assert that, for Gene arrays, probesets
>>> target a particular exon of a particular gene and transcript cluster
>>> enables gene-level estimates as Exon arrays. The only difference is
>>> that, for Exon arrays, we have two more "annotation levels" with less
>>> confidence score (extended and full). Otherwise, what is the utility of
>>> summarizing at the probeset level on Hu Gene arrays?
>>>
>>> This is related to my second question: can HuGene could detect
>>> alternative splice events reliably? Can HuGene be used as an economical
>>> exon array for just the well-annotated content (core)?
>>>
>>> Thanks again,
>>> Javier
>>>
>>>
>>> Thanks,
>>> Javier
>>>
>>>
>>> R version 2.13.0 (2011-04-13)
>>> Platform: x86_64-pc-mingw32/x64 (64-bit)
>>>
>>> locale:
>>> [1] LC_COLLATE=Spanish_Spain.1252 LC_CTYPE=Spanish_Spain.1252
>>> LC_MONETARY=Spanish_Spain.1252 LC_NUMERIC=C
>>> [5] LC_TIME=Spanish_Spain.1252
>>>
>>> attached base packages:
>>> [1] stats graphics grDevices utils datasets methods base
>>>
>>> other attached packages:
>>> [1] pd.hugene.1.0.st.v1_3.0.2 hugene10sttranscriptcluster.db_7.0.1
>>> org.Hs.eg.db_2.5.0 RSQLite_0.9-4
>>> [5] DBI_0.2-5 AnnotationDbi_1.14.1 oligo_1.16.0 oligoClasses_1.14.0
>>> [9] affyPLM_1.28.5 preprocessCore_1.14.0 gcrma_2.24.1 affy_1.30.0
>>> [13] Biobase_2.12.1
>>>
>>> loaded via a namespace (and not attached):
>>> [1] affxparser_1.24.0 affyio_1.20.0 Biostrings_2.20.0 bit_1.1-6 ff_2.2-1
>>> IRanges_1.10.0 splines_2.13.0 tools_2.13.0
>>>
>>>
>>>
>>>
>>> On 25/04/2011 19:36, cstrato wrote:
>>>> Dear Javier,
>>>>
>>>> Since you do not supply your sessionInfo() it is not possible to
>>>> answer your question.
>>>>
>>>> However, please note that levels core, extended, full do only exist
>>>> for Exon ST arrays but not for Gene ST arrays.
>>>>
>>>> Best regards
>>>> Christian
>>>> _._._._._._._._._._._._._._._._._._
>>>> C.h.r.i.s.t.i.a.n S.t.r.a.t.o.w.a
>>>> V.i.e.n.n.a A.u.s.t.r.i.a
>>>> e.m.a.i.l: cstrato at aon.at
>>>> _._._._._._._._._._._._._._._._._._
>>>>
>>>>
>>>> On 4/25/11 7:24 PM, Javier Pérez Florido wrote:
>>>>> Dear list,
>>>>> I have two questions regarding Human Gene 1.0 ST arrays:
>>>>>
>>>>> * Both NUSE and RLE plots need a fitted object using fitPLM
>>>>> function. Now, this function accepts raw data from a set of Hu
>>>>> Gene 1.0 arrays, but, internally, this function performs a RMA
>>>>> normalization. What level is used for this normalization? I cannot
>>>>> choose the level (i.e. core, full, extended) for the "internal"
>>>>> normalization.
>>>>> * Are a splicing analysis using Hu Gene 1.0 arrays (core analysis)
>>>>> and a splicing analysis using Hu Exon 1.0 arrays (core analysis)
>>>>> equivalent in terms of results?
>>>>>
>>>>>
>>>>> Thanks,
>>>>> Javier
>>>>>
>>>>>
>>>>> [[alternative HTML version deleted]]
>>>>>
>>>>> _______________________________________________
>>>>> Bioconductor mailing list
>>>>> Bioconductor at r-project.org
>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>> Search the archives:
>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>>
>>>>
>>>
>>>
>>
>
>