[BioC] Mouse Gene ST v1 CDF Issues (MoGene10stv1): Failure of affyPLM and pdfInfoBuilder

James W. MacDonald jmacdon at med.umich.edu
Tue Dec 2 15:32:28 CET 2008


Hi Peter,

I won't comment on aroma.affymetrix, nor building cdf packages using 
makecdfenv as the former has its own mailing list and the latter isn't 
really supported - the list archive you quote is Ben Bolstad showing 
that you _could_ use makecdfenv, but then raising several questions that 
have not been resolved to my knowledge.

As for building a pdInfoPackage, this works fine for me:

 > makePdInfoPackage(pkg, destDir=".")
Creating package in ./pd.mogene.1.0.st.v1
loadUnitsByBatch took 46.92 sec
loadAffyCsv took 19.19 sec
loadAffySeqCsv took 51.92 sec
DB sort, index creation took 20.82 sec
[1] TRUE
Warning messages:
1: In is.na(x) : is.na() applied to non-(list or vector) of type 'NULL'
2: In is.na(x) : is.na() applied to non-(list or vector) of type 'NULL'
 > sessionInfo()
R version 2.8.0 (2008-10-20)
i386-pc-mingw32

locale:
LC_COLLATE=English_United States.1252;LC_CTYPE=English_United 
States.1252;LC_MONETARY=English_United 
States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252

attached base packages:
[1] splines   tools     stats     graphics  grDevices datasets  utils
[8] methods   base

other attached packages:
[1] pdInfoBuilder_1.6.0  oligo_1.6.0          oligoClasses_1.4.0
[4] AnnotationDbi_1.4.0  preprocessCore_1.4.0 affxparser_1.14.0
[7] RSQLite_0.7-1        DBI_0.2-4            Biobase_2.2.0

Note that it would have been helpful for you to give us your 
sessionInfo() as well.

The install went fine:

---------- Making package pd.mogene.1.0.st.v1 ------------
   adding build stamp to DESCRIPTION
   installing NAMESPACE file and metadata
   installing R files
   installing inst files
   preparing package pd.mogene.1.0.st.v1 for lazy loading
Loading required package: RSQLite
Loading required package: DBI
Loading required package: oligoClasses
Loading required package: Biobase
Loading required package: tools

Welcome to Bioconductor

   Vignettes contain introductory material. To view, type
   'openVignette()'. To cite Bioconductor, see
   'citation("Biobase")' and for packages 'citation(pkgname)'.

   no man files in this package
   installing indices
   installing help
   adding MD5 sums

* DONE (pd.mogene.1.0.st.v1)

I would bet that your problem stems from having Cygwin installed as well 
as the Windows Toolset (Rtools). If you don't have your path set 
correctly, then you may find the wrong version of certain tools and 
things won't build correctly.

I have personally found that Cygwin is problematic when installed, and 
can make matters worse if you then uninstall because for whatever reason 
you then cannot find certain tools. Does the install directory of the 
Windows Toolset reside higher up in the PATH than Cygwin?

Best,

Jim



Peter White wrote:
>> I am having some issues with the Affymetrix Mouse Gene ST 1.0 array
> (MoGene10stv1) and bioconductor. I can see that there are issues regarding this
> array and the unsupported CDF that can be downloaded from Affy but I was able
> to create the mogene10stv1cdf library as outlined in the thread:
> 
> https://stat.ethz.ch/pipermail/bioc-devel/2007-October/001403.html
> 
> I have processed the data using both Bioconductors Affy Package and the
> aroma.Affymetrix package but get different results. I believe the issue is that
> aroma is using the affyPLM model. I wanted to check this using the bioconductor
> affyPLM package but it will not work:
> 
> Method 1 - works fine:
> 
> library(affy)
> AffyRaw <- ReadAffy()
> AffyEset <- rma(AffyRaw)
> data.affy <- exprs(AffyEset)
> 
> Method 2 - fails:
> 
> library(affyPLM)
> AffyRaw <- ReadAffy()
> fit <- fitPLM(AffyRaw, verbos=9)
> 
>  Background correcting PM
>  Normalizing PM
>  Fitting models
>  Error in fitPLM(AffyRaw, verbos = 9) :
>    Realloc could not re-allocate (size 1150530304) memory
> 
> I also tried the following but it still could not run:
> 
> fit <- fitPLM(AffyRaw, output.param=list(weights=FALSE, residuals=FALSE,
> varcov="none", resid.SE=FALSE))
> 
> Finally, I dropped the number of arrays from 16 to 6, then down to 2, but still
> no luck.
> 
> So from piecing together different threads I wondered if the issue lied with
> the unsupported CDF. So I attempted to use the pdfInfoBuilder / oligo pipeline
> as outlined in this thread:
> 
> http://article.gmane.org/gmane.science.biology.informatics.conductor/18963/matc
> h=mogene
> 
> Again, I ran into problems:
> 
>> pgfFile <- "MoGene-1_0-st-v1.r3.pgf"
>> clfFile <- "MoGene-1_0-st-v1.r3.clf"
>> transFile <- "MoGene-1_0-st-v1.na26.mm9.transcript.txt"
>> probeFile <- "MoGene-1_0-st-v1.probe.tab"
>> pkg <- new("AffyGenePDInfoPkgSeed", author="Peter White", email="peter.white
> at nationwidechildrens.org", version="0.1.3", genomebuild="UCSC mm9,  July
> 2007", biocViews="AnnotationData", pgfFile=pgfFile, clfFile=clfFile, transFile=
> transFile, probeFile=probeFile)
>> makePdInfoPackage(pkg, destDir=".")
> Creating package in ./pd.mogene.1.0.st.v1
> loadUnitsByBatch took 54.44 sec
> loadAffyCsv took 53.58 sec
> loadAffySeqCsv took 80.68 sec
> DB sort, index creation took 90.24 sec
> [1] TRUE
> Warning messages:
> 1: In is.na(x) : is.na() applied to non-(list or vector) of type 'NULL'
> 2: In is.na(x) : is.na() applied to non-(list or vector) of type 'NULL'
> 
> Close R and start the command prompt and navigate to the directory with the
> package:
> 
> R CMD INSTALL pd.mogene.1.0.st.v1\
> 
> installing to 'c:/PROGRA~2/R/R-28~1.0/library'
> 
> ---------- Making package pd.mogene.1.0.st.v1 ------------
>   adding build stamp to DESCRIPTION
>   installing NAMESPACE file and metadata
>   installing R files
>   installing inst files
> FIND: Parameter format not correct
> make[2]: *** [c:/PROGRA~2/R/R-28~1.0/library/pd.mogene.1.0.st.v1/inst] Error 2
> make[1]: *** [all] Error 2
> make: *** [pkg-pd.mogene.1.0.st.v1] Error 2
> *** Installation of pd.mogene.1.0.st.v1 failed ***
> 
> Removing 'c:/PROGRA~2/R/R-28~1.0/library/pd.mogene.1.0.st.v1'
> 
> So the installation fails and I cannot work out why (I have RTools and Cygwin
> installed). I did notice some inconsistencies in the annotation files for these
> arrays that can be downloaded from the Affy site and wondered if these could be
> the source of the problem:
> 
> 1.	From the file MoGene-1_0-st-v1.probe.tab there are 35,605 distinct
> Transcript IDs.
> 2.	From the file MoGene-1_0-st-v1.na26.mm9.transcript.csv there are 35,567
> transcript IDs . 38 transcripts ids are missing from this file. What are they
> and why were they not included (10412488, 10412495, 10412500, 10412503,
> 10412520, 10417226, 10417239, 10417269, 10417286, 10441511, 10468907, 10490232,
> 10501544, 10535342, 10536010, 10536044, 10536095, 10536114, 10536118, 10536163,
> 10550163, 10550775, 10560746, 10577361, 10598118, 10598141, 10598159, 10598207,
> 10598220, 10598603, 10599086, 10606573, 10608226, 10608440, 10608551, 10608554,
> 10608603, 10608606)
> 3.	From the file MoGene-1_0-st-v1.r3.cdf there are 35,512 Transcript IDs.
> So we are now missing an additional 93 probe sets (all of these can be found in
> the transcript file: 10338002, 10338005, 10338006, 10338007, 10338008,
> 10338009, 10338010, 10338011, 10338012, 10338013, 10338014, 10338015, 10338016,
> 10338018, 10338019, 10338020, 10338021, 10338022, 10338023, 10338024, 10338027,
> 10338028, 10338030, 10338031, 10338032, 10338033, 10338034, 10338038, 10338039,
> 10338040, 10338043, 10338045, 10338046, 10338048, 10338049, 10338050, 10338051,
> 10338052, 10338053, 10338054, 10338055, 10338057, 10338058, 10338061, 10338062,
> 10349381, 10350469, 10354866, 10361826, 10362430, 10362438, 10362444, 10362452,
> 10362872, 10369759, 10374030, 10391748, 10395778, 10411504, 10422960, 10436496,
> 10436660, 10446349, 10453719, 10457089, 10458079, 10460144, 10461932, 10481652,
> 10482786, 10487009, 10498317, 10501216, 10502040, 10502768, 10503414, 10513713,
> 10521665, 10532622, 10535929, 10546555, 10552810, 10553535, 10560364, 10582560,
> 10582566, 10582570, 10582576, 10585872, 10586931, 10592453, 10601614,
> 10602194). Again, why were they not included?
> 
> BTW: I am using R 2.8.0 and the latest release of Bioconductor (2.3) on a
> Windows XP 64-bit machine.
> 
> Any help out there would be greatly appreciated.
> 
> Thanks,
> 
> Peter
> 
> Peter White, Ph.D.
> Director, Biomedical Genomics Core
> Research Assistant Professor of Pediatrics
> The Research Institute at
> Nationwide Children's Hospital and
> The Ohio State University
> 
> Mailing Address:
> 
> The Research Institute at
> Nationwide Children's Hospital
> 700 Children's Drive, W510
> Columbus, OH 43205
> 
> Office: (614) 355-2671
> Lab: (614) 355-5252
> Fax: (614) 722-2818
> Web: http://genomics.nchresearch.org/
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
James W. MacDonald, M.S.
Biostatistician
Hildebrandt Lab
8220D MSRB III
1150 W. Medical Center Drive
Ann Arbor MI 48109-0646
734-936-8662



More information about the Bioconductor mailing list