[BioC] pdInfoBuilder fails on Affy's GeneChip Human Transcriptome Array 2.0

James W. MacDonald jmacdon at uw.edu
Thu Jan 23 20:00:54 CET 2014


This is an existing problem. See the email sent to the listserv just 
hours ago, asking for an update on progress:

https://stat.ethz.ch/pipermail/bioconductor/attachments/20140123/4847d433/attachment.pl

Best,

Jim

On 1/23/2014 1:34 PM, Guilherme Rocha wrote:
>   Dear all,
>
> I am trying to create the pfInfoBuilder packages for Affy's GeneChip Human
> Transcriptome Array 2.0.
>
> I am using the "original" pgf, clf, mps, and probeset.csv files from the
> library files from Affy's website (
> http://www.affymetrix.com/Auth/analysis/downloads/lf/hta/HTA-2_0/AGCC_library_installer_HTA-2_0.zip
> ).
>
> I was able to read the probeset.csv file using plain vanilla read.csv.
>          Thus, it is likely the solution given to a similar problem with
> Arabidopsis chips does not apply ("pdInfoBuilder fails on the new
> Arabidopsis Gene ST 1.0 & 1.1 arrays",
> https://stat.ethz.ch/pipermail/bioconductor/2012-March/044231.html)
>
> Details are shown below.
>
>          Any help greatly appreciated.
>
> Regards,
>
> Guilherme Rocha
>
>
> ------------------------------------------------------------------------------------------------------------
> R Code and output:
>
>> library(pdInfoBuilder)
> Loading required package: Biobase
> Loading required package: BiocGenerics
> Loading required package: parallel
>
> Attaching package: 'BiocGenerics'
>
> The following objects are masked from 'package:parallel':
>
>      clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
>      clusterExport, clusterMap, parApply, parCapply, parLapply,
>      parLapplyLB, parRapply, parSapply, parSapplyLB
>
> The following object is masked from 'package:stats':
>
>      xtabs
>
> The following objects are masked from 'package:base':
>
>      Filter, Find, Map, Position, Reduce, anyDuplicated, append,
>      as.data.frame, as.vector, cbind, colnames, duplicated, eval, evalq,
>      get, intersect, is.unsorted, lapply, mapply, match, mget, order,
>      paste, pmax, pmax.int, pmin, pmin.int, rank, rbind, rep.int,
>      rownames, sapply, setdiff, sort, table, tapply, union, unique,
>      unlist
>
> Welcome to Bioconductor
>
>      Vignettes contain introductory material; view with
>      'browseVignettes()'. To cite Bioconductor, see
>      'citation("Biobase")', and for packages 'citation("pkgname")'.
>
> Loading required package: RSQLite
> Loading required package: DBI
> Loading required package: affxparser
> Loading required package: oligo
> Loading required package: oligoClasses
> Welcome to oligoClasses version 1.24.0
> ================================================================================
> Welcome to oligo version 1.26.0
> ================================================================================
>
> Attaching package: 'oligo'
>
> The following object is masked from 'package:BiocGenerics':
>
>      normalize
>
>> base_dir = "./"
>>
>> pgf          = paste(base_dir, "/HTA-2_0.r1.pgf", sep="")
>> clf          = paste(base_dir, "/HTA-2_0.r1.clf", sep="")
>> prob         = paste(base_dir, "/HTA-2_0.na33.hg19.probeset.csv", sep="")
>> core_mps     = paste(base_dir, "/HTA-2_0.r1.Psrs.mps", sep="")
>> extended_mps = paste(base_dir, "/HTA-2_0.r1.Psrs.mps", sep="")
>> full_mps     = paste(base_dir, "/HTA-2_0.r1.Psrs.mps", sep="")
>>
>> test_csv     = read.csv(paste(base_dir,
> "/HTA-2_0.na33.hg19.probeset.csv", sep=""), skip=14, header=T)
>> seed = new("AffyExonPDInfoPkgSeed",
> +            pgfFile     = pgf,
> +            clfFile     = clf,
> +            probeFile   = prob,
> +            coreMps     = core_mps,
> +            extendedMps = extended_mps,
> +            fullMps     = full_mps,
> +            author      = "GR",
> +            email       = "anemailadress at gmail.com",
> +            biocViews   = "AnnotationData",
> +            genomebuild = "GRCh37",
> +            organism    = "Human",
> +            species     = "Homo sapiens",
> +            url         = "")
>> makePdInfoPackage(seed, destDir=base_dir);
> ================================================================================
> Building annotation package for Affymetrix Exon ST Array
> PGF.........: HTA-2_0.r1.pgf
> CLF.........: HTA-2_0.r1.clf
> Probeset....: HTA-2_0.na33.hg19.probeset.csv
> Transcript..: TheTranscriptFile
> Core MPS....: HTA-2_0.r1.Psrs.mps
> Full MPS....: HTA-2_0.r1.Psrs.mps
> Extended MPS: HTA-2_0.r1.Psrs.mps
> ================================================================================
> Parsing file: HTA-2_0.r1.pgf... OK
> Parsing file: HTA-2_0.r1.clf... OK
> Creating initial table for probes... OK
> Creating dictionaries... OK
> Parsing file: HTA-2_0.na33.hg19.probeset.csv... OK
> Parsing file: HTA-2_0.r1.Psrs.mps... OK
> Parsing file: HTA-2_0.r1.Psrs.mps... OK
> Parsing file: HTA-2_0.r1.Psrs.mps... OK
> Creating package in .//pd.hta.2.0
> Inserting 850 rows into table chrom_dict... OK
> Inserting 5 rows into table level_dict... OK
> Inserting 11 rows into table type_dict... OK
> Inserting 577432 rows into table core_mps... OK
> Inserting 577432 rows into table full_mps... OK
> Inserting 577432 rows into table extended_mps... OK
> Inserting 1839617 rows into table featureSet... Error in
> sqliteExecStatement(con, statement, bind.data) :
>    RS-DBI driver: (RS_SQLite_exec: could not execute: datatype mismatch)
>> sessionInfo()
> R version 3.0.2 (2013-09-25)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
> [1] C
>
> attached base packages:
> [1] parallel  stats     graphics  grDevices utils     datasets  methods
> [8] base
>
> other attached packages:
> [1] pdInfoBuilder_1.26.0 oligo_1.26.0         oligoClasses_1.24.0
> [4] affxparser_1.34.0    RSQLite_0.11.4       DBI_0.2-7
> [7] Biobase_2.22.0       BiocGenerics_0.8.0
>
> loaded via a namespace (and not attached):
>   [1] BiocInstaller_1.12.0  Biostrings_2.30.0     GenomicRanges_1.14.1
>   [4] IRanges_1.20.0        XVector_0.2.0         affyio_1.30.0
>   [7] bit_1.1-10            codetools_0.2-8       ff_2.2-12
> [10] foreach_1.4.1         iterators_1.0.6       preprocessCore_1.24.0
> [13] splines_3.0.2         stats4_3.0.2          zlibbioc_1.8.0
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099



More information about the Bioconductor mailing list