[BioC] Missing probesets when creating Affymetrix GeneChip miRNA 4.0 CDF package using makecdfenv package

Huang, Lei [BSD] - CRI lhuang at bsd.uchicago.edu
Wed Jan 15 21:53:25 CET 2014


Thanks a lot Jim! Do you think the problems you found also contribute to the missing probesets when building cdf package from makecdfenv?

Best,

Lei
On Jan 15, 2014, at 2:30 PM, James W. MacDonald <jmacdon at u.washington.edu> wrote:

> Hi Lei,
>
> It turns out that there are at least two differences between the miRNA 4.0 array and those that came before it.
>
> First, there are now no MM probes at all (for the 3.0, for example, there were 180 MM probes). This is the cause of the error you see when trying to make the pd.mirna.4.0 package. The code expects MM probes and thus tries to put those probes into the 'mmfeature' table of the database, and errors when there are none. This is pretty easy to fix - you can just put a test for MM probes into the code, and if there are no MM probes you just skip that step. Hypothetically I could have patched the code and sent you a pd.mirna.4.0 package that would work (and then sent the patch to Benilton Carvalho).
>
> However, there is a bigger problem that will require more effort, and should be handled by Benilton. Prior versions of the miRNA arrays never shared probes between probesets, so the code for building the pd package for the existing miRNA arrays is a modification of the code used to create pd packages for the Exon ST arrays, which also never share probes among probesets.
>
> The miRNA 4.0 array is now like the Gene ST arrays, which also share probes between probesets, so the code will have to be modified to account for that fact. This will take more than a couple of simple changes, so you (we) will have to wait for Benilton to fix it.
>
> Best,
>
> Jim
>
> On 1/15/2014 1:15 AM, Lei Huang [guest] wrote:
>> Dear all,
>>
>> I am working on a set of Affymetrix GeneChip miRNA 4.0 microarray data and would like to perform differential expression analysis using Bioconductor packages. Since this is a fairly new platform, no CDF and annotation packages are available in bioconductor repository at the moment. Affymetrix folks kindly provided me miRNA 4.0 CDF file as well as sample CEL data. So I desided to create a CDF package by my own using make.cdf.package() from makecdfenv package. I was able to make the package and install it without trouble. However, after I read the raw CEL files and normalized the affybatch with vsnrma()/rma(), I found the number of probesets is only 25065 while the number is 36249 in original Affymetrix miRNA 4.0 CDF file. I am aware that from version 4, Affymetrix changed their naming convention for the probeset IDs, but this shouldn't cause the problem of missing probesets. What I did wrong? I would really appreciate if anyone could give me some hints/advices on solving this
>>  problem.
>>
>> -Lei
>>
>> --
>> Lei Huang
>> Center for Research Informatics
>> Biological Science Division
>> University of Chicago
>> http://cri.uchicago.edu
>> --
>>
>> P.S. The following are the code and output from my R session:
>>
>>> setwd("~/Documents/Project/mirna/GeneChip 4-0 Array Sample Data")
>>> library(affy)
>>> library(makecdfenv)
>> Loading required package: affyio
>>> pkgpath <- tempdir()
>>> pname <- cleancdfname(whatcdf("20131118_Human-Brain-AM7962-130ng_rep1_(miRNA-4_0).CEL"))
>>> make.cdf.package("miRNA-4_0-st-v1.cdf", cdf.path="~/Documents/Project/mirna/miRNA-4_0-st-v1_CDF",
>> +                  compress=FALSE, species = "", packagename=pname, package.path = pkgpath)
>> Reading CDF file.
>> Creating CDF environment
>> Wait for about 251 dots.............................................................................................................................................................................................................................................................
>> Creating package in /var/folders/rh/rrlg3bcs6kgcj89zm4mgjjxh0000gq/T//RtmpRos3Be/mirna40cdf
>>
>> README PLEASE:
>> A source package has now been produced in
>> /var/folders/rh/rrlg3bcs6kgcj89zm4mgjjxh0000gq/T//RtmpRos3Be/mirna40cdf.
>> Before using this package it must be installed via 'R CMD INSTALL'
>> at a terminal prompt (or DOS command shell).
>> If you are using Windows, you will need to get set up to install packages.
>> See the 'R Installation and Administration' manual, specifically
>> Section 6 'Add-on Packages' as well as 'Appendix E: The Windows Toolset'
>> for more information.
>>
>> Alternatively, you could use make.cdf.env(), which will not require you to install a package.
>> However, this environment will only persist for the current R session
>> unless you save() it.
>>
>> ## install the cdf package from shell
>> ## cd to mirna40cdf location
>> ## R CMD INSTALL mirna40cdf
>>
>>> library(limma)
>>> library(vsn)
>>> library(mirna40cdf)
>>>
>>> affybatch <- ReadAffy(filenames=list.files())
>>> affybatch at cdfName
>> [1] "miRNA-4_0"
>>
>> ## normalization
>>> eset.norm <- vsnrma(affybatch)
>> vsn2: 292681 x 8 matrix (1 stratum).
>> Please use 'meanSdPlot' to verify the fit.
>> Calculating Expression
>>
>> ## only 25,065 probesets, the original Affymetrix cdf file contains 36,249 probesets
>>> dim(eset.norm)
>> Features  Samples
>>    25065        8
>>
>>
>>  -- output of sessionInfo():
>>
>>> sessionInfo()
>> R version 3.0.2 (2013-09-25)
>> Platform: x86_64-apple-darwin10.8.0 (64-bit)
>>
>> locale:
>> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>>
>> attached base packages:
>> [1] parallel  stats     graphics  grDevices utils     datasets  methods   base
>>
>> other attached packages:
>> [1] mirna40cdf_1.38.0    AnnotationDbi_1.24.0 vsn_3.30.0
>> [4] limma_3.18.9         makecdfenv_1.38.0    affyio_1.30.0
>> [7] affy_1.40.0          Biobase_2.22.0       BiocGenerics_0.8.0
>>
>> loaded via a namespace (and not attached):
>>  [1] BiocInstaller_1.12.0  compiler_3.0.2        DBI_0.2-7
>>  [4] grid_3.0.2            IRanges_1.20.6        lattice_0.20-24
>>  [7] preprocessCore_1.24.0 RSQLite_0.11.4        stats4_3.0.2
>> [10] tools_3.0.2           zlibbioc_1.8.0
>>
>>
>> --
>> Sent via the guest posting facility at bioconductor.org.
>
> --
> James W. MacDonald, M.S.
> Biostatistician
> University of Washington
> Environmental and Occupational Health Sciences
> 4225 Roosevelt Way NE, # 100
> Seattle WA 98105-6099
>
>


________________________________
This email is intended only for the use of the individua...{{dropped:10}}



More information about the Bioconductor mailing list