[BioC] Analysis and annotation (full) of Affymetrix Mouse Exon 1.0 ST arrays

James Perkins jperkins at biochem.ucl.ac.uk
Wed Jun 27 16:25:46 CEST 2012


Thanks for the pointer Andreas,

How did you go from probe sets for a given gene to the transcript
level? And how did you know if it was "core", "extended", "full"
confidence?

Also, how did you summarise the probeset expression levels to make a
transcript? Using biomart I get ~25k unique ensembl genes mapping to
probe set ids, which is much higher than when I follow the oligo
pipeline and perform RMA at core/extended/full level, and use getAffx
for annotation.

Thanks,

Jim

On 27 June 2012 16:03, Andreas Heider <aheider at trm.uni-leipzig.de> wrote:
> Dear Jim,
> I pulled all relevant annotation via biomaRt, as biomart was all mappings of
> exon array probeset IDs to eg ENTREZID or GENESYMBOL. Than you can go on
> from that.
>
> Cheers,
> Andreas
>
>
> 2012/6/27 James Perkins <jperkins at biochem.ucl.ac.uk>
>>
>> Hi,
>>
>> I wasn't sure if this was worth starting a new thread for this, since
>> my question is very much related to this thread...
>>
>> Is there any plan to include the "comprehensive" exon array mappings?
>>
>> E.g. for rat:
>>
>> If one goes here
>>
>>
>> http://www.affymetrix.com/estore/browse/products.jsp?productId=131489&categoryId=35748&productName=GeneChip-Rat-Exon-1.0-ST-Array#1_1
>>
>> Then to Technical Documentation tab
>>
>> And downloads the
>>
>> "Rat Exon 1.0 ST Array Probeset, and Meta Probeset Files, core, full,
>> extended and comprehensive rn4" data
>>
>>
>> http://www.affymetrix.com/Auth/support/downloads/library_files/RaEx-1_0-st-v1.r2.dt1.rn4.ps.zip
>>
>> There are the core/extended/full ps and mps files here.
>>
>> However there is also a comprehensive mps file.
>>
>> Full, core and extended are from 2006.
>>
>> The comprehensive is from 2010 (and gets updated more regularly), so
>> perhaps would be a better file to use for getNetAffx ?
>>
>> Apologies if this has been covered before. I am never sure of what is
>> the best way to analyse exon array data at the gene level.
>>
>> Thanks,
>>
>> Jim
>>
>>
>>
>>
>> On 13 June 2012 21:37, Benilton Carvalho <beniltoncarvalho at gmail.com>
>> wrote:
>> >
>> > please correct the code below to:
>> >
>> > eset = rma(raw, target='full') ## or 'core', 'extended' (whatever is
>> > available)
>> >
>> > and if you want results at the exon level
>> >
>> > eset = rma(raw, target='probeset')
>> > featureData(eset) = getNetAffx(raw, 'probeset')
>> >
>> > apologies for the mistake below.
>> >
>> > b
>> >
>> > On 13 June 2012 20:11, Benilton Carvalho <beniltoncarvalho at gmail.com>
>> > wrote:
>> > > FWIW, remember that you can obtain the contents of the annotation
>> > > files (the NA32 Affymetrix files) with:
>> > >
>> > > library(Biobase)
>> > > library(oligo)
>> > > raw = read.celfiles(list.celfiles())
>> > > eset = rma(raw, target='transcript')
>> > > featureData(eset) = getNetAffx(eset, 'transcript')
>> > > head(fData(eset))
>> > >
>> > > b
>> > >
>> > > On 13 June 2012 15:47, James W. MacDonald <jmacdon at uw.edu> wrote:
>> > >> Hi Andreas,
>> > >>
>> > >>
>> > >> On 6/13/2012 3:14 AM, Andreas Heider wrote:
>> > >>>
>> > >>> Dear mailing list,
>> > >>> I know this was on the list couple of times, and I think I read it
>> > >>> all,
>> > >>> but
>> > >>> actually I still don't get it right. So here is my problem:
>> > >>>
>> > >>> I want to be able to work with Mouse Exon 1.0 ST Arrays (NOT Mouse
>> > >>> Gene
>> > >>> 1.0
>> > >>> ST) in a similar fashion to eg. HG-U133 arrays.
>> > >>> That means, I want to finally have it accessible as an ExpressionSet
>> > >>> object
>> > >>> with a right Bioconductor annotation assigned. This should include
>> > >>> GENE
>> > >>> SYMBOLS, RefSeq IDs and ENTREZ IDs.
>> > >>
>> > >>
>> > >> The problem here is that you want to do something that AFAIK isn't
>> > >> easy to
>> > >> do. The Gene ST arrays allow you to summarize all the probes that
>> > >> interrogate a particular transcript (e.g., all the exon-level
>> > >> probesets are
>> > >> collapsed to transcript level, and then you summarize). However, for
>> > >> the
>> > >> Exon ST arrays that isn't the case, unless there is something in xps
>> > >> to
>> > >> allow for that - I know next to nothing about that package, so
>> > >> Cristian
>> > >> Stratowa will have to chime in if I am missing something.
>> > >>
>> > >> For the Exon chips, you are always summarizing at the same probeset
>> > >> level,
>> > >> where there are <= 4 probes per probeset, and there can be any number
>> > >> of
>> > >> probesets that interrogate a given exon. Lots of these probesets
>> > >> interrogate
>> > >> regions that aren't even transcribed, according to current knowledge
>> > >> of the
>> > >> genome. When you choose core, extended or full probesets, you are
>> > >> just
>> > >> changing the number of probesets being used, not summarizing at a
>> > >> different
>> > >> level as with the Gene ST chip.
>> > >>
>> > >> So when you say you want gene symbols, refseq ids and gene ids, what
>> > >> exactly
>> > >> are you after? If a given probeset is in the intron of a gene do you
>> > >> want to
>> > >> annotate it as being part of that gene? How about if it is in the UTR
>> > >> (or
>> > >> really close to the UTR)? What do you want to do with the probesets
>> > >> where
>> > >> one or more of the probes binds in multiple positions in the genome?
>> > >> These
>> > >> are all questions that the exonmap package tries to consider, and it
>> > >> gets
>> > >> really complicated. That's why Affy went with the Gene ST chips -
>> > >> they
>> > >> unleashed the Exon chips on us and couldn't sell them because people
>> > >> were
>> > >> saying WTF do I do with this thing?
>> > >>
>> > >> I don't think there is an easy or obvious answer to your question. If
>> > >> you
>> > >> were to come up with what you think are reasonable answers to my
>> > >> questions,
>> > >> then it wouldn't be much work to extract the chr, start, end from the
>> > >> pd.moex.1.0.st.v1 package, and then use GenomicFeatures (e.g.,
>> > >>  findOverlaps()) to decide what regions are being interrogated, and
>> > >> annotate
>> > >> from there.
>> > >>
>> > >> Best,
>> > >>
>> > >> Jim
>> > >>
>> > >>
>> > >>
>> > >>>
>> > >>> I can import it as a AffyBatch and generate an ExpressionSet with
>> > >>> the help
>> > >>> of the Xmap/exonmap supplied CDF, but there is no annotation
>> > >>> attached to
>> > >>> it.
>> > >>>
>> > >>> OR
>> > >>>
>> > >>> I can import the CEL files with the "oligo" package as a Exon Array
>> > >>> object
>> > >>> and generate an ExpressionSet from it.
>> > >>> However in that case it still have no annotation.
>> > >>>
>> > >>> Surprisingly on the Bioconductor website there are all packages
>> > >>> needed to
>> > >>> deal with Mouse Gene 1.0 ST arrays but the informtion to work with
>> > >>> Mouse
>> > >>> Exon 1.0 ST arrays seems missing!
>> > >>>
>> > >>> What am I doing wrong here? Has someone else had such problems?
>> > >>>
>> > >>> Thanks in advance for your effort,
>> > >>> Andreas
>> > >>>
>> > >>>        [[alternative HTML version deleted]]
>> > >>>
>> > >>> _______________________________________________
>> > >>> Bioconductor mailing list
>> > >>> Bioconductor at r-project.org
>> > >>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> > >>> Search the archives:
>> > >>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>> > >>
>> > >>
>> > >> --
>> > >> James W. MacDonald, M.S.
>> > >> Biostatistician
>> > >> University of Washington
>> > >> Environmental and Occupational Health Sciences
>> > >> 4225 Roosevelt Way NE, # 100
>> > >> Seattle WA 98105-6099
>> > >>
>> > >>
>> > >> _______________________________________________
>> > >> Bioconductor mailing list
>> > >> Bioconductor at r-project.org
>> > >> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> > >> Search the archives:
>> > >> http://news.gmane.org/gmane.science.biology.informatics.conductor
>> >
>> > _______________________________________________
>> > Bioconductor mailing list
>> > Bioconductor at r-project.org
>> > https://stat.ethz.ch/mailman/listinfo/bioconductor
>> > Search the archives:
>> > http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>



More information about the Bioconductor mailing list