[BioC] makePdInfoPackage for Primeview arrays

Wed Jun 19 16:01:54 CEST 2013

Hi Max,

It seems that the primeview array is in a sort of no-man's land. The cdf 
file that I have, and use to build the cdf package is a text cdf. Why 
this matters is covered here:

https://stat.ethz.ch/pipermail/bioc-devel/2007-October/001403.html

The upshot being that the software used to produce the cdf packages for 
use with the affy package will only use a multiply-mapped probeset one 
time (e.g., if a probe is used in two or more probesets, it will only be 
mapped to one probeset when producing the cdf package). I just checked, 
and the current version of the cdf is still text.

So if you use the cdf that you automatically get from BioC, then you are 
explicitly excluding some probes from some probesets. And this is how 
things will remain, as we are just offering a converted version of what 
we get from the Affy website.

But there are things you can do if you want to do different things.

First, you can use the affxparser package to convert the cdf file from 
Affy (which you can get here: 
http://www.affymetrix.com/support/downloads/library_files/primeview_libraryfile.zip 
Unzip and then open CD_PrimeView_rev01/Full/PrimeView/LibFiles and put 
the PrimeView.cdf somewhere useful). You can then use convertCdf() to 
make a binary format cdf, and then use makecdfenv to make a cdfpackage 
that will have all of the multiply-mapped probes in each probeset.

Alternatively, you can use one of the MBNI remapped cdfs. You will want 
to go here:

http://brainarray.mbni.med.umich.edu/Brainarray/Database/CustomCDF/CDF_download.asp

then choose the mapping you like, and then figure out what the cdf 
package name is. You can then use biocLite() to get the correct cdf for 
your version of R. So if you like Entrez Gene mappings, you would want

biocLite("primeviewhsentrezgcdf")

Best,

Jim

On 6/19/2013 3:49 AM, Max Kauer wrote:
> Thanks everybody for your replies!
> Maybe a short addon: the reason I wanted to make this pd.info file was that
> I wanted to try SCAN.UPC on this arrays (which wants that file). Otherwise I
> tried already rma with the probe and cdf files from the Bioconductor site.
> And this worked just fine. At least it seemed fine to me - now with probes
> mapping to multiple probesets, I wonder if that could do something funny to
> the analysis.
>
> Best,
> Max
>
>
>
> -----Original Message-----
> From: cstrato [mailto:cstrato at aon.at]
> Sent: Tuesday, June 18, 2013 9:18 PM
> To: Max Kauer
> Cc: Bioconductor at r-project.org
> Subject: Re: [BioC] makePdInfoPackage for Primeview arrays
>
> Dear Max,
>
> In principle you could also use package xps, which can handle PrimeView
> arrays. To create a root 'scheme' file (see vignette xps.pdf) you simply
> need to do:
>
> ### new R session: load library xps
> library(xps)
>
> ### define directories:
> # directory containing Affymetrix library files libdir<-
> "/Volumes/GigaDrive/Affy/libraryfiles"
> # directory containing Affymetrix annotation files anndir<-
> "/Volumes/GigaDrive/Affy/Annotation"
> # directory to store ROOT scheme files
> scmdir<- "/Volumes/GigaDrive/CRAN/Workspaces/Schemes"
>
> ### create scheme file:
> scheme.primeview<- import.expr.scheme("primeview", filedir =
> file.path(scmdir, "na33"),
>                            schemefile = file.path(libdir, "PrimeView.CDF"),
>                            probefile  = file.path(libdir,
> "PrimeView.probe.tab"),
>                            annotfile  = file.path(anndir, "Version12Nov",
> "PrimeView.na33.annot.csv"))
>
> For more information and examples see also the example scripts in
> xps/examples/script4schemes.R and xps/examples/script4xps.R
>
> Best regards,
> Christian
> _._._._._._._._._._._._._._._._._._
> C.h.r.i.s.t.i.a.n   S.t.r.a.t.o.w.a
> V.i.e.n.n.a           A.u.s.t.r.i.a
> e.m.a.i.l:        cstrato at aon.at
> _._._._._._._._._._._._._._._._._._
>
>
>
> On 6/18/13 3:25 PM, Max Kauer wrote:
>> Hi,
>>
>> I am trying to make a pd.info package for the Affy Primeview array,
>> but I get an error.
>>
>> Thanks for any help!
>>
>> Cheers,
>>
>> Max
>>
>>
>>
>>
>>
>> This is my code:
>>
>>
>>
>> library(pdInfoBuilder)
>>
>> cdf<- list.files( pathAnnotPr, pattern = ".cdf", full.names = TRUE )
>>
>> cel<- list.files( pathC, pattern = ".CEL", full.names = TRUE )[1] #
>> take first array
>>
>> tab<- list.files(pathAnnotPr, pattern = "_tab", full.names = TRUE)
>>
>>
>>
>> seed<- new("AffyExpressionPDInfoPkgSeed",
>>
>>         cdfFile = cdf, celFile = cel,
>>
>>         tabSeqFile = tab, author = "xx",
>>
>>         email = "xx",
>>
>>         biocViews = "AnnotationData",
>>
>>         genomebuild = "hg19",
>>
>>         organism = "Human", species = "Homo Sapiens",
>>
>>         url = "xx"
>>
>> )
>>
>> makePdInfoPackage( seed, destDir = "." )
>>
>>
>>
>>
>>
>>
>>
>> Which produces this output/error (although a pd.primeview directory is
>> created):
>>
>>
>>
>> ======================================================================
>> ======
>> ====
>>
>> Building annotation package for Affymetrix Expression array
>>
>> CDF...............:  PrimeView.cdf
>>
>> CEL...............:  MJ_05042013_TAS_10_PrimeView.CEL
>>
>> Sequence TAB-Delim:  PrimeView.probe_tab
>>
>> ======================================================================
>> ======
>> ====
>>
>> Parsing file: PrimeView.cdf... OK
>>
>> Parsing file: MJ_05042013_TAS_10_PrimeView.CEL... OK
>>
>> Parsing file: PrimeView.probe_tab... OK
>>
>> Getting information for featureSet table... OK
>>
>> Getting information for pm/mm feature tables...
>>
>> OK
>>
>> Combining probe information with sequence information... OK
>>
>> Getting PM probes and sequences... OK
>>
>> Done parsing.
>>
>> Creating package in ./pd.primeview
>>
>> Inserting 49395 rows into table featureSet... OK
>>
>> Inserting 609663 rows into table pmfeature... Error in
>> sqliteExecStatement(con, statement, bind.data) :
>>
>>     RS-DBI driver: (RS_SQLite_exec: could not execute: PRIMARY KEY must
>> be
>> unique)
>>
>> In addition: Warning messages:
>>
>> 1: In parseCdfCelProbe(object at cdfFile, object at celFile, object at tabSeqFile,
> :
>>     Probe sequences were not found for all PM probes. These probes will
>> be removed from the pmSequence object.
>>
>> 2: In parseCdfCelProbe(object at cdfFile, object at celFile, object at tabSeqFile,
> :
>>     Probe sequences were not found for all MM probes. These probes will
>> be removed from the mmSequence object.
>>
>>
>>
>>
>>
>>
>>
>>> sessionInfo()
>> R version 3.0.0 (2013-04-03)
>>
>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>
>>
>>
>> locale:
>>
>> [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>>
>>    [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>>
>>    [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>>
>>    [7] LC_PAPER=C                 LC_NAME=C
>>
>>    [9] LC_ADDRESS=C               LC_TELEPHONE=C
>>
>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>
>>
>>
>> attached base packages:
>>
>> [1] parallel  stats     graphics  grDevices utils     datasets  methods
>>
>> [8] base
>>
>>
>>
>> other attached packages:
>>
>> [1] pdInfoBuilder_1.24.0 oligo_1.24.0         oligoClasses_1.22.0
>>
>> [4] affxparser_1.32.1    RSQLite_0.11.4       DBI_0.2-7
>>
>> [7] Biobase_2.20.0       BiocGenerics_0.6.0
>>
>>
>>
>> loaded via a namespace (and not attached):
>>
>> [1] affyio_1.28.0         BiocInstaller_1.10.2  Biostrings_2.28.0
>>
>>    [4] bit_1.1-10            codetools_0.2-8       ff_2.2-11
>>
>>    [7] foreach_1.4.1         GenomicRanges_1.12.4  IRanges_1.18.1
>>
>> [10] iterators_1.0.6       preprocessCore_1.22.0 splines_3.0.0
>>
>> [13] stats4_3.0.0          zlibbioc_1.6.0
>>
>>
>>
>> Max Kauer
>>
>> CHILDREN'S CANCER RESEARCH INSTITUTE
>>
>>
>>
>>
>> 	[[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099