[BioC] Using pdInfoBuilder for Human Exon Array

Mon Jun 20 23:41:01 CEST 2011

Hi, 

My name is Maria Rodrigo-Domingo, I am a PhD student in Biostatistics in Aalborg, Denmark. I am trying to create a package for Affymetrix HuEx-1_0 v2 using package pdInfoBuilder in order to build a custom CDF file afterwards. However, I find a number of inconsistencies between the output from R, see below, and the documentation, "Building Annotation Packages with pdInfoBuilder for Use with the oligo Package" by Benilton Carvalho on April 7, 2009. I have surrounded the differences between the documentation and my output by *'s.

Are these differences only due to updates in the package that have not been submitted to the documentation or are they actual errors that will be carried on to my cdf file? The messages related to core, full and extended probeset files do not appear in the documentation either, as I think they were not a possible input at the time the documentation was written, so I just assume they are correct and will not comment anything about them.

Besides, I get two warning messages that I believe originate from file "HuEx-1_0-st-v2.r2.pgf". Any ideas of what could be causing the warning messages and whether they will affect the resulting package?

Thanks for any feedback.

Best,
Maria

R version 2.13.0 (2011-04-13)
Copyright (C) 2011 The R Foundation for Statistical Computing ISBN 3-900051-07-0
Platform: x86_64-pc-mingw32/x64 (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

 Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and 'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or 'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> source("http://bioconductor.org/biocLite.R")
> biocLite("pdInfoBuilder")
>
> library("pdInfoBuilder")
> setwd("*******************")
>
> pgfFile = "HuEx-1_0-st-v2.r2.pgf";
> clfFile = "HuEx-1_0-st-v2.r2.clf";
> probeFile = "HuEx-1_0-st-v2.na31.hg19.probeset.csv"
> transFile = "HuEx-1_0-st-v2.na31.hg19.transcript.csv";
> coreMps = "HuEx-1_0-st-v2.r2.dt1.hg18.core.mps";
> extendedMps = "HuEx-1_0-st-v2.r2.dt1.hg18.extended.mps";
> fullMps = "HuEx-1_0-st-v2.r2.dt1.hg18.full.mps"
>
> pkg <- new("AffyExonPDInfoPkgSeed",
>               version = "0.0.2",
>               author = "Maria********", email="*****************",
>               biocViews = "AnnotationData",
>               genomebuild = "hg19",
>               organism = "Human", species = "Homo Sapiens",
>               pgfFile=pgfFile,
>               clfFile=clfFile,
>               probeFile=probeFile,
>               transFile=transFile,
>               coreMps=coreMps,
>               extendedMps=extendedMps,
>               fullMps=fullMps);
>
> makePdInfoPackage(pkg, destDir=".");

Building annotation package for Affymetrix Exon ST Array
PGF.........: HuEx-1_0-st-v2.r2.pgf
CLF.........: HuEx-1_0-st-v2.r2.clf
Probeset....: HuEx-1_0-st-v2.na31.hg19.probeset.csv
Transcript..: HuEx-1_0-st-v2.na31.hg19.transcript.csv
Core MPS....: HuEx-1_0-st-v2.r2.dt1.hg18.core.mps
Full MPS....: HuEx-1_0-st-v2.r2.dt1.hg18.full.mps
Extended MPS: HuEx-1_0-st-v2.r2.dt1.hg18.extended.mps
================================================================================
Parsing file: HuEx-1_0-st-v2.r2.pgf...
Parsing file: HuEx-1_0-st-v2.r2.clf...
Creating initial table for probes...
Creating dictionaries...
Parsing file: HuEx-1_0-st-v2.na31.hg19.probeset.csv...
Parsing file: HuEx-1_0-st-v2.r2.dt1.hg18.core.mps...
Parsing file: HuEx-1_0-st-v2.r2.dt1.hg18.extended.mps...
Parsing file: HuEx-1_0-st-v2.r2.dt1.hg18.full.mps...
**** MISSING  Creating probeset -> table... OK ****
**** MISSING Creating genes table... OK **** Creating package in ./pd.huex.1.0.st.v2 Inserting 525 rows into table chrom_dict... **** MORE ROWS THAN IN DOCUMENTATION **** Inserting 5 rows into table level_dict...
Inserting 8 rows into table type_dict...
Inserting 233001 rows into table core_mps... OK Inserting 878628 rows into table full_mps... OK Inserting 544116 rows into table extended_mps... OK
**** MISSING Inserting 1625370 rows into table "fset2gene"... OK ****
**** MISSING Inserting 114281 rows into table "gene"... OK **** Inserting 1432143 rows into table featureSet... OK **** MORE ROWS THAN IN DOCUMENTATION **** Inserting 5411273 rows into table pmfeature... OK **** MORE ROWS THAN IN DOCUMENTATION **** Inserting 21249 rows into table mmfeature...  OK **** SHOULD THIS BE bgfeature INSTEAD? IF SO, LESS ROWS THAN IN DOCUMENTATION ****
****  IN THE FOLLOWING LINES, INCONSISTENCIES BETWEEN OUTPUT AND DOCUMENTATION CORRESPONDING TO THE ABOVE: MISSING OUTPUT FOR "fset2gene" AND "gene". "bg" PROBABLY EXCHANGED FOR "mm" **** Counting rows in chrom_dict Counting rows in core_mps Counting rows in extended_mps Counting rows in featureSet Counting rows in full_mps Counting rows in level_dict Counting rows in mmfeature Counting rows in pmfeature Counting rows in type_dict Creating index idx_pmfsetid on pmfeature... OK Creating index idx_pmfid on pmfeature... OK Creating index idx_fsfsetid on featureSet... OK Creating index idx_core_meta_fsetid on core_mps... OK Creating index idx_core_fsetid on core_mps... OK Creating index idx_full_meta_fsetid on full_mps... OK Creating index idx_full_fsetid on full_mps... OK Creating index idx_extended_meta_fsetid on extended_mps... OK Creating index idx_extended_fsetid on extended_mps... OK Creating index idx_mmfsetid on mmfeature... OK Creating index idx_mmfid on mmfeature... OK Saving DataFrame object for PM.
Saving DataFrame object for MM.
Saving NetAffx Annotation... OK **** DOES NOT APPEAR IN DOCUMENTATION**** Done.
Warning messages:
1: In is.na(x) : is.na() applied to non-(list or vector) of type 'NULL'
2: In is.na(x) : is.na() applied to non-(list or vector) of type 'NULL'