[BioC] using frma and brainarray cdf on exon arrays

Steve Piccolo stephen.piccolo at hsc.utah.edu
Wed Jan 15 23:31:44 CET 2014


Hi Benilton,

Thanks for your detailed response. The oligo package is fantastic!

When processing exon arrays (or any Affy array, for that matter), we use
oligo to read CEL files (extract raw probe intensities) and to obtain the
PM sequence for each probe. If the user specifies a BrainArray package
(via the probeSummaryPackage parameter), we map the probes to the
BrainArray annotations using x and y coordinates and then summarize based
on which probes map to which genes in their annotations (we don’t use a
CDF). If the user does not specify a BrainArray package, we map the probes
to the probesets provided in the oligo package.  Optionally, users can
indicate which target probes (via exonArrayTarget parameter) they want to
use. If they are using BrainArray mappings with exon arrays, we suggest
that they specify “probeset,” although I can see why this might be
confusing because we don’t actually use the probeset definitions―but this
option gives us access to all probes on the array, whereas the other
targets do not.

I’m not stating this to claim that our approach is better than using CDFs.
But so far our approach has been working well for us and seems to give us
more control over the processing. If you see a fatal flaw with this
approach, please let me know, and I’m happy to reconsider.

I just used BrainArray annotations on some data profiled on both exon
arrays and U133 arrays and saw strong correlation between the two at the
gene level, so it seems to be working well.

Regards,
-Steve


From:  Benilton Carvalho <beniltoncarvalho at gmail.com>
Date:  Tuesday, January 14, 2014 at 10:18 AM
To:  Ty Thomson <tthomson at selventa.com>
Cc:  Stephen Piccolo <stephen.piccolo at hsc.utah.edu>,
"bioconductor at r-project.org" <bioconductor at r-project.org>
Subject:  Re: [BioC] using frma and brainarray cdf on exon arrays


I may have misunderstood what Steve described... So, I'm writing just to
expose my version of what oligo does re: exon arrays:

1) oligo uses the official PGF/MPS files provided by Affymetrix to proceed
with preprocessing;

2) given the design of the chip, there are different probeset definitions
(which are given by Affymetrix through the MPS files). These different
definitions are: core ("most reliable  evidence from RefSeq and
full-length mRNA GenBank records"), extended
 ("supported by other cDNA evidence beyond what is used to support core
probe sets") and full ("supported by computational gene prediction
evidence only").

3) oligo uses the definitions above to perform preprocessing (this can be
changed throught the 'target' argument in rma()). The user can also use
the value 'probeset' to use the probeset definition given in the PGF.

Now, re: what the BrainArray group does... they start with *only* chip
coordinates and probe sequences, nothing else (note: this is independent
from Affymetrix probeset definitions and, therefore, not related to the
probeset definitions that we use in
 oligo - which come from Affymetrix). These sequences are then aligned and
reannotated. This results in new probeset definitions, which they provide
as what I call "BrainArray CDFs" (note that after this procedure, there is
no such thing as core/full/extended
 probesets)....

For the moment, oligo does not officially support the remapped CDFs. But
that's something you should expect for the near future.

b



2014/1/14 Ty Thomson <tthomson at selventa.com>

Hi Steve,

Thanks, that makes a lot of sense.  I might just need to get another R
installation going so I can directly assess the effect of using all the
probes on my data.

Regards,

Ty

-----Original Message-----
From: Steve Piccolo [mailto:stephen.piccolo at hsc.utah.edu]

Sent: Tuesday, January 14, 2014 9:35 AM
To: Ty Thomson; bioconductor at r-project.org
Subject: Re: using frma and brainarray cdf on exon arrays


Hi Ty,

The exon arrays have about 5 million probes. By default, the oligo package
uses only a subset of these probes, which have been flagged as being most
useful. However, the BrainArray package may want to include probes that
are not included by oligo. Thus it may
 be valuable to start with all probes (exonarrayTarget=³probeset²) so that
all probes included in the BrainArray mappings will be included. The
previous version of SCAN.UPC uses the default oligo probes.

Regards,
-Steve

On 1/14/14, 7:27 AM, "Ty Thomson" <tthomson at selventa.com> wrote:

>Hi Steve,
>
>Thanks for the suggestion.  I'm running bioconductor 2.12 with
>SCAN.UPC_2.0.2, and it doesn't look like this version of SCAN supports
>the exonArrayTarget parameter.  After reading the documentation for the
>exonArrayTarget parameter in the most recent version of SCAN I don't
>really understand exactly what it does.  Can you give me a bit more
>information on the difference when using a brain array CDF
>(probeSummaryPackage=hgu133plus2hsentrezgprobe) between what happens in
>my current version of SCAN and in the newest version when setting
>exonArrayTarget="probset"?  I am reluctant to change version of
>bioconductor mid-project for the effects this could have on other
>aspects of the project (and the potential need to re-run everything
>again).
>
>Regards,
>
>Ty
>
>-----Original Message-----
>From: Steve Piccolo [mailto:stephen.piccolo at hsc.utah.edu]
>Sent: Tuesday, January 14, 2014 8:25 AM
>To: bioconductor at r-project.org; Ty Thomson
>Subject: Re: using frma and brainarray cdf on exon arrays
>
>Hi Ty,
>
>Another option is to try the SCAN.UPC package. It is also designed to
>support single-sample normalization. And it should be able to integrate
>the BrainArray annotations for the exon arrays. In the SCAN function,
>pay attention to the probeSummaryPackage and exonArrayTarget
>parameters. And the InstallBrainArrayPackage function may come in
>handy.  Let me know if you run into any troubles with it.
>
>Regards,
>-Steve
>
>
>
>On 1/14/14, 4:00 AM, "bioconductor-request at r-project.org"
><bioconductor-request at r-project.org> wrote:
>
>>Date: Mon, 13 Jan 2014 20:25:44 +0000
>>From: Ty Thomson <tthomson at selventa.com>
>>To: Matthew McCall <mccallm at gmail.com>
>>Cc: "bioconductor at r-project.org" <bioconductor at r-project.org>
>>Subject: Re: [BioC] using frma and brainarray cdf on exon arrays
>>Message-ID:
>>
>><f71eeb981506499f9b6e0c317b4eb53a at CO2PR04MB601.namprd04.prod.outlook.c
>>o
>>m>
>>
>>Content-Type: text/plain; charset="iso-8859-1"
>>
>>Hi Matt,
>>
>>Thanks for the quick reply and the suggestions.  I'll look into making
>>my own pd annotation and frmavecs packages.
>>
>>Ty
>>
>>
>>-----Original Message-----
>>From: Matthew McCall [mailto:mccallm at gmail.com]
>>Sent: Monday, January 13, 2014 3:14 PM
>>To: Ty Thomson
>>Cc: bioconductor at r-project.org
>>Subject: Re: using frma and brainarray cdf on exon arrays
>>
>>Ty,
>>
>>Unfortunately, what you are attempting to do is not currently
>>implemented. The frmavecs packages for the 3' Affy arrays (HGU133a,
>>HGU133plus2, etc) contain implementations for both the Affy and
>>BrainArray CDFs. The issue is with the newer Affy arrays (HuGene 1.0
>>ST, etc.), which should really be read in using the oligo package and
>>corresponding pd annotation packages. This is implemented in frma for
>>the Affy annotation, but currently the BrainArray folks provide CDFs,
>>which can't be readily used. This may be changing fairly soon -- I
>>believe the BrainArray group may start offering oligo compatible
>>annotation packages.
>>Until then, to do what you would like would
>>require: (1) making your own pd annotation package corresponding to
>>the BrainArray alternative CDF, and (2) creating your own frmavecs
>>package for the alternative annotation.
>>
>>Best,
>>Matt
>>
>>
>>On Mon, Jan 13, 2014 at 2:53 PM, Ty Thomson <tthomson at selventa.com>
>>wrote:
>>>Apologies if any of my questions are na?ve, as I don't have a lot of
>>>experience with exon arrays.  I would like to analyze an Affy HuGene
>>>1.0 ST array using the brainArray CDF (because I want to get
>>>expression values on a per gene level), and use fRMA (to correct for
>>>batch effects and enable me to add additional samples in the future
>>>without redoing RMA).  Is this possible?  Here's what I've tried
>>>without much success thus far:
>>>
>>>
>>>
>>>I can skip brainarray and just load the data and run fRMA without error:
>>>
>>>>exonFS <- read.celfiles(filenames=cel.files, verbose=F,
>>>>celfile.path=NULL)
>>>
>>>>tmp.eset <- frma(exonFS, summarize="median_polish")
>>>
>>>
>>>
>>>But when I use ReadAffy and specify the cdf, fRMA generates an error:
>>>
>>>>Affybatch <- ReadAffy(filenames=cel.files, verbose=F,
>>>>celfile.path=NULL,
>>>>cdfname="hugene10sthsentrezg")
>>>
>>>Warning message:
>>>
>>>
>>>
>>>The affy package can process data from the Gene ST 1.x series of
>>>arrays,
>>>
>>>but you should consider using either the oligo or xps packages, which
>>>are specifically
>>>
>>>designed for these arrays.
>>>
>>>
>>>
>>>>tmp.eset <- frma(Affybatch, summarize="median_polish")
>>>
>>>Error in frmaMedPol(object, background, normalize, target,
>>>input.vecs,
>>>:
>>>
>>>   hugene10stfrmavecs package must be installed first
>>>
>>>In addition: Warning message:
>>>
>>>In library(package, lib.loc = lib.loc, character.only = TRUE,
>>>logical.return = TRUE,  :
>>>
>>>   there is no package called 'hugene10stfrmavecs'
>>>
>>>
>>>
>>>
>>>
>>>When I try running fRMA while directly passing the frmavecs to the
>>>function I get a different error:
>>>
>>>>data(hugene.1.0.st.v1frmavecs)
>>>
>>>>tmp.eset <- frma(Affybatch, summarize="median_polish",
>>>>input.vecs=hugene.1.0.st.v1frmavecs)
>>>
>>>Warning message:
>>>
>>>In log2(pms) - input.vecs$probeVec :
>>>
>>>   longer object length is not a multiple of shorter object length
>>>
>>>
>>>
>>>
>>>
>>>Thanks in advance for any help,
>>>
>>>
>>>
>>>Ty
>>>
>>>
>>>
>>>
>>>
>>>>sessionInfo()
>>>
>>>R version 3.0.1 (2013-05-16)
>>>
>>>Platform: x86_64-w64-mingw32/x64 (64-bit)
>>>
>>>
>>>
>>>locale:
>>>
>>>[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United
>>>States.1252    LC_MONETARY=English_United States.1252
>>>
>>>[4] LC_NUMERIC=C                           LC_TIME=English_United
>>>States.1252
>>>
>>>
>>>
>>>attached base packages:
>>>
>>>[1] parallel  stats     graphics  grDevices utils     datasets  methods
>>>base
>>>
>>>
>>>
>>>other attached packages:
>>>
>>>[1] hugene10sthsentrezgcdf_17.1.0  AnnotationDbi_1.22.6
>>>hugene.1.0.st.v1frmavecs_0.0.3 pd.hugene.1.0.st.v1_3.8.0
>>>
>>>  [5] RSQLite_0.11.4                 DBI_0.2-7
>>>frma_1.12.0                    oligo_1.24.2
>>>
>>>  [9] oligoClasses_1.22.0            affy_1.38.1
>>>Biobase_2.20.1                 BiocGenerics_0.6.0
>>>
>>>
>>>
>>>loaded via a namespace (and not attached):
>>>
>>>[1] affxparser_1.32.3     affyio_1.28.0         BiocInstaller_1.10.4
>>>Biostrings_2.28.0     bit_1.1-11            codetools_0.2-8
>>>
>>>  [7] ff_2.2-12             foreach_1.4.1         GenomicRanges_1.12.5
>>>IRanges_1.18.4        iterators_1.0.6       MASS_7.3-29
>>>
>>>[13] preprocessCore_1.22.0 splines_3.0.1         stats4_3.0.1
>>>tools_3.0.1           zlibbioc_1.6.0
>>
>

_______________________________________________
Bioconductor mailing list
Bioconductor at r-project.org
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
<http://news.gmane.org/gmane.science.biology.informatics.conductor>









More information about the Bioconductor mailing list