[BioC] Missing probesets when creating Affymetrix GeneChip miRNA 4.0 CDF package using makecdfenv package

Isaac Neuhaus isaac.neuhaus at bms.com
Fri Jan 24 20:34:05 CET 2014


Lei Huang [guest] <guest at ...> writes:

> 
> 
> Dear all,
> 
> I am working on a set of Affymetrix GeneChip miRNA 4.0 microarray data and 
would like to perform
> differential expression analysis using Bioconductor packages. Since this 
is a fairly new platform, no
> CDF and annotation packages are available in bioconductor repository at 
the moment. Affymetrix folks
> kindly provided me miRNA 4.0 CDF file as well as sample CEL data. So I 
desided to create a CDF package by my own
> using make.cdf.package() from makecdfenv package. I was able to make the 
package and install it without
> trouble. However, after I read the raw CEL files and normalized the 
affybatch with vsnrma()/rma(), I
> found the number of probesets is only 25065 while the number is 36249 in 
original Affymetrix miRNA 4.0 CDF
> file. I am aware that from version 4, Affymetrix changed their naming 
conve
>  ntion for the probeset IDs, but this shouldn't cause the problem of 
missing probesets. What I did wrong? I
> would really appreciate if anyone could give me some hints/advices on 
solving this 
>  problem. 
> 
> -Lei
> 
> --
> Lei Huang
> Center for Research Informatics
> Biological Science Division
> University of Chicago
> http://cri.uchicago.edu
> --
> 
> P.S. The following are the code and output from my R session:
> 
> > setwd("~/Documents/Project/mirna/GeneChip 4-0 Array Sample Data")
> > library(affy)
> > library(makecdfenv)
> Loading required package: affyio
> > pkgpath <- tempdir()
> > pname <- cleancdfname(whatcdf("20131118_Human-Brain-AM7962-
130ng_rep1_(miRNA-4_0).CEL"))
> > make.cdf.package("miRNA-4_0-st-v1.cdf",
> cdf.path="~/Documents/Project/mirna/miRNA-4_0-st-v1_CDF", 
> +                  compress=FALSE, species = "", packagename=pname, 
package.path = pkgpath)
> Reading CDF file.
> Creating CDF environment
> Wait for about 251 
dots........................................................................
............................................................................
............................................................................
.............................
> Creating package in 
/var/folders/rh/rrlg3bcs6kgcj89zm4mgjjxh0000gq/T//RtmpRos3Be/mirna40cdf 
> 
> README PLEASE:
> A source package has now been produced in
> /var/folders/rh/rrlg3bcs6kgcj89zm4mgjjxh0000gq/T//RtmpRos3Be/mirna40cdf.
> Before using this package it must be installed via 'R CMD INSTALL'
> at a terminal prompt (or DOS command shell).
> If you are using Windows, you will need to get set up to install packages.
> See the 'R Installation and Administration' manual, specifically
> Section 6 'Add-on Packages' as well as 'Appendix E: The Windows Toolset'
> for more information.
> 
> Alternatively, you could use make.cdf.env(), which will not require you to 
install a package.
> However, this environment will only persist for the current R session
> unless you save() it.
> 
> ## install the cdf package from shell
> ## cd to mirna40cdf location
> ## R CMD INSTALL mirna40cdf
> 
> > library(limma)
> > library(vsn)
> > library(mirna40cdf)
> >
> > affybatch <- ReadAffy(filenames=list.files())
> > affybatch <at> cdfName
> [1] "miRNA-4_0"
> 
> ## normalization
> > eset.norm <- vsnrma(affybatch)
> vsn2: 292681 x 8 matrix (1 stratum). 
> Please use 'meanSdPlot' to verify the fit.
> Calculating Expression
> 
> ## only 25,065 probesets, the original Affymetrix cdf file contains 36,249 
probesets
> > dim(eset.norm)
> Features  Samples 
>    25065        8 
> 
>  -- output of sessionInfo(): 
> 
> > sessionInfo()
> R version 3.0.2 (2013-09-25)
> Platform: x86_64-apple-darwin10.8.0 (64-bit)
> 
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
> 
> attached base packages:
> [1] parallel  stats     graphics  grDevices utils     datasets  methods   
base     
> 
> other attached packages:
> [1] mirna40cdf_1.38.0    AnnotationDbi_1.24.0 vsn_3.30.0          
> [4] limma_3.18.9         makecdfenv_1.38.0    affyio_1.30.0       
> [7] affy_1.40.0          Biobase_2.22.0       BiocGenerics_0.8.0  
> 
> loaded via a namespace (and not attached):
>  [1] BiocInstaller_1.12.0  compiler_3.0.2        DBI_0.2-7            
>  [4] grid_3.0.2            IRanges_1.20.6        lattice_0.20-24      
>  [7] preprocessCore_1.24.0 RSQLite_0.11.4        stats4_3.0.2         
> [10] tools_3.0.2           zlibbioc_1.8.0       
> 
> --
> Sent via the guest posting facility at bioconductor.org.
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at ...
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: 
http://news.gmane.org/gmane.science.biology.informatics.conductor
> 
> 

I came across a similar problem with a brainCDF where makecdfenv was 
producing a package with less probesets. I believe the problem is in the c 
code that does the parser of ASCII files since I was able to correct the 
problem by converting the text CDF into binary and then read it with the 
makecdfenv package

library("affxparser")
library(makecdfenv)
convertCdf("HGU133PLUS2_HS_REFSEQ.CDF", "hgu133plus2hsrefseqcdf", version=4, 
verbose=TRUE)
make.cdf.package("hgu133plus2hsrefseqcdf", version = 
packageDescription("makecdfenv", field = "Version"), species = "H. sapiens", 
unlink = TRUE)

I hope this helps.

Isaac



More information about the Bioconductor mailing list