[BioC] Using custom CDF with 'make.cdf.env'

Scott Robinson Scott.Robinson at glasgow.ac.uk
Wed Aug 27 17:58:12 CEST 2014


Ah!

Sorry, reading that reply I instantly saw the problem – I forgot to change the probe set ID for the individual rows.

Thanks very much James

From: James W. MacDonald [mailto:jmacdon at uw.edu]
Sent: 27 August 2014 16:52
To: Scott Robinson
Cc: bioconductor at r-project.org
Subject: Re: Using custom CDF with 'make.cdf.env'

Hi Scott,

I see some of what you have done. As an example, you moved things around, and changed the 'Cell' number:

C:\Users\BioinfAdmin\Desktop>grep -n bta-let-7a_st miRNA-1_0.CDF
129939:Name=bta-let-7a_st
129946:Cell1=185        178     ACTCCATCATCCAACATATCAA  control bta-let-7a_st   0
129947:Cell2=197        180     ACTCCATCATCCAACATATCAA  control bta-let-7a_st   1
129948:Cell3=83 156     ACTCCATCATCCAACATATCAA  control bta-let-7a_st   2       11
129949:Cell4=210        187     ACTCCATCATCCAACATATCAA  control bta-let-7a_st   3

C:\Users\BioinfAdmin\Desktop>grep -n bta-let-7a_st newmir1.cdf
43056:Cell5=185 178     ACTCCATCATCCAACATATCAA  control bta-let-7a_st   4       11
43057:Cell6=197 180     ACTCCATCATCCAACATATCAA  control bta-let-7a_st   5       11
43058:Cell7=83  156     ACTCCATCATCCAACATATCAA  control bta-let-7a_st   6       11
43059:Cell8=210 187     ACTCCATCATCCAACATATCAA  control bta-let-7a_st   7       11


This won't change anything. In both cases, there is a probeset called bta-let-7a_st, that has four identical probes. Putting these data somewhere else in the cdf won't change the way it is parsed.

In other words, this:

C:\Users\BioinfAdmin\Desktop> sed -n '43050,43111p' newmir1.cdf
StopPosition=59
CellHeader=X    Y       PROBE   FEAT    QUAL    EXPOS   POS     CBASE   PBASE   TBA
Cell1=2 190     ACTCCATCATCCAACATATCAA  control hsa-let-7a_st   0       11      G
Cell2=196       180     ACTCCATCATCCAACATATCAA  control hsa-let-7a_st   1       11
Cell3=211       187     ACTCCATCATCCAACATATCAA  control hsa-let-7a_st   2       11
Cell4=29        205     ACTCCATCATCCAACATATCAA  control hsa-let-7a_st   3       11
Cell5=185       178     ACTCCATCATCCAACATATCAA  control bta-let-7a_st   4       11
Cell6=197       180     ACTCCATCATCCAACATATCAA  control bta-let-7a_st   5       11
Cell7=83        156     ACTCCATCATCCAACATATCAA  control bta-let-7a_st   6       11
Cell8=210       187     ACTCCATCATCCAACATATCAA  control bta-let-7a_st   7       11
Cell9=2 189     ACTCCATCATCCAACATATCAA  control cbr-let-7_st    8       11      G
Cell10=178      178     ACTCCATCATCCAACATATCAA  control cbr-let-7_st    9       11
Cell11=212      189     ACTCCATCATCCAACATATCAA  control cbr-let-7_st    10      11
Cell12=189      181     ACTCCATCATCCAACATATCAA  control cbr-let-7_st    11      11
Cell13=179      178     ACTCCATCATCCAACATATCAA  control cel-let-7_st    12      11
Cell14=80       157     ACTCCATCATCCAACATATCAA  control cel-let-7_st    13      11
Cell15=215      191     ACTCCATCATCCAACATATCAA  control cel-let-7_st    14      11
Cell16=190      181     ACTCCATCATCCAACATATCAA  control cel-let-7_st    15      11
Cell17=79       157     ACTCCATCATCCAACATATCAA  control cfa-let-7a_st   16      11
Cell18=213      189     ACTCCATCATCCAACATATCAA  control cfa-let-7a_st   17      11
Cell19=182      179     ACTCCATCATCCAACATATCAA  control cfa-let-7a_st   18      11
Cell20=196      181     ACTCCATCATCCAACATATCAA  control cfa-let-7a_st   19      11
Cell21=205      184     ACTCCATCATCCAACATATCAA  control dre-let-7a_st   20      11
Cell22=188      181     ACTCCATCATCCAACATATCAA  control dre-let-7a_st   21      11
Cell23=216      191     ACTCCATCATCCAACATATCAA  control dre-let-7a_st   22      11
Cell24=83       157     ACTCCATCATCCAACATATCAA  control dre-let-7a_st   23      11
Cell25=77       157     ACTCCATCATCCAACATATCAA  control fru-let-7a_st   24      11
Cell26=212      188     ACTCCATCATCCAACATATCAA  control fru-let-7a_st   25      11
Cell27=193      181     ACTCCATCATCCAACATATCAA  control fru-let-7a_st   26      11
Cell28=182      180     ACTCCATCATCCAACATATCAA  control fru-let-7a_st   27      11
Cell29=188      180     ACTCCATCATCCAACATATCAA  control gga-let-7a_st   28      11
Cell30=211      189     ACTCCATCATCCAACATATCAA  control gga-let-7a_st   29      11
Cell31=78       157     ACTCCATCATCCAACATATCAA  control gga-let-7a_st   30      11
Cell32=199      180     ACTCCATCATCCAACATATCAA  control gga-let-7a_st   31      11
Cell33=214      188     ACTCCATCATCCAACATATCAA  control gga-let-7j_st   32      11
Cell34=191      181     ACTCCATCATCCAACATATCAA  control gga-let-7j_st   33      11
Cell35=180      177     ACTCCATCATCCAACATATCAA  control gga-let-7j_st   34      11
Cell36=203      180     ACTCCATCATCCAACATATCAA  control gga-let-7j_st   35      11
Cell37=211      188     ACTCCATCATCCAACATATCAA  control mdo-let-7a_st   36      11
Cell38=184      179     ACTCCATCATCCAACATATCAA  control mdo-let-7a_st   37      11
Cell39=195      181     ACTCCATCATCCAACATATCAA  control mdo-let-7a_st   38      11
Cell40=82       157     ACTCCATCATCCAACATATCAA  control mdo-let-7a_st   39      11
Cell41=179      177     ACTCCATCATCCAACATATCAA  control mml-let-7a_st   40      11
Cell42=190      182     ACTCCATCATCCAACATATCAA  control mml-let-7a_st   41      11
Cell43=214      191     ACTCCATCATCCAACATATCAA  control mml-let-7a_st   42      11
Cell44=202      180     ACTCCATCATCCAACATATCAA  control mml-let-7a_st   43      11
Cell45=183      179     ACTCCATCATCCAACATATCAA  control mmu-let-7a_st   44      11
Cell46=84       157     ACTCCATCATCCAACATATCAA  control mmu-let-7a_st   45      11
Cell47=194      181     ACTCCATCATCCAACATATCAA  control mmu-let-7a_st   46      11
Cell48=212      187     ACTCCATCATCCAACATATCAA  control mmu-let-7a_st   47      11
Cell49=76       157     ACTCCATCATCCAACATATCAA  control rno-let-7a_st   48      11
Cell50=192      181     ACTCCATCATCCAACATATCAA  control rno-let-7a_st   49      11
Cell51=181      177     ACTCCATCATCCAACATATCAA  control rno-let-7a_st   50      11
Cell52=212      191     ACTCCATCATCCAACATATCAA  control rno-let-7a_st   51      11
Cell53=187      181     ACTCCATCATCCAACATATCAA  control tni-let-7a_st   52      11
Cell54=128      77      ACTCCATCATCCAACATATCAA  control tni-let-7a_st   53      11
Cell55=81       157     ACTCCATCATCCAACATATCAA  control tni-let-7a_st   54      11
Cell56=213      191     ACTCCATCATCCAACATATCAA  control tni-let-7a_st   55      11
Cell57=214      189     ACTCCATCATCCAACATATCAA  control xtr-let-7a_st   56      11
Cell58=185      179     ACTCCATCATCCAACATATCAA  control xtr-let-7a_st   57      11
Cell59=22       202     ACTCCATCATCCAACATATCAA  control xtr-let-7a_st   58      11
Cell60=197      181     ACTCCATCATCCAACATATCAA  control xtr-let-7a_st   59      11

will not create a single probeset for let-7a, over all species. And trying to combine 60 identical 25-mers into a single probeset is about as useless as having 15 individual probesets made up of four identical probes. You are still running RMA (or whatever) on essentially the same information, with the only differences between probes being entirely due to technical variability. These arrays are, within the constraints of Affy's system, about as good as you can do. Which is to say, not very good.

If you really want to do what you want to do, then you have to also make the probeset IDs identical within each block. So here you would have to strip off the prepended species abbreviation, and convert the gga-let-7j probes to let-7a_st, and then you would have just one probeset. But that will be a lot of work for what I imagine will be very little gain.

Best,

Jim



On Wed, Aug 27, 2014 at 11:19 AM, James W. MacDonald <jmacdon at uw.edu<mailto:jmacdon at uw.edu>> wrote:
Hi Scott,

As far as I can tell, you haven't made any changes to the cdf at all:

> z <- make.cdf.env("newmir1.cdf")
Reading CDF file.
Creating CDF environment
Wait for about 78 dots.........................................................................
> z
<environment: 0x00000000113d5c08>
> length(ls(z))
[1] 7815
> zz <- as.list(z)
> table(sapply(zz, nrow))

   4    8    9   10   11   20   25   40   50   67   73   88   89   90   91   92   94
6703    8   14   32  959    9    1    1    2    1    1    1    2    1    1    1   78
> y <- make.cdf.env("miRNA-1_0.CDF")
Reading CDF file.
Creating CDF environment
Wait for about 78 dots..........................................................................
> yy <- as.list(y)
> length(yy)
[1] 7815
> table(sapply(yy, nrow))

   4    8    9   10   11   20   25   40   50   67   73   88   89   90   91   92   94
6703    8   14   32  959    9    1    1    2    1    1    1    2    1    1    1   78
> all.equal(names(zz), names(yy))
[1] TRUE

Best,

Jim



On Wed, Aug 27, 2014 at 10:31 AM, Scott Robinson <Scott.Robinson at glasgow.ac.uk<mailto:Scott.Robinson at glasgow.ac.uk>> wrote:
Dear All,

Since it exceeds 1MB, here is a link to the old ("miRNA-1_0.CDF") and new ("newmir1.cdf") CDFs, test script and example CEL file:

http://www.files.com/set/53fdeb0aa2176

Thanks,

Scott
________________________________________
From: Scott Robinson [guest] [guest at bioconductor.org<mailto:guest at bioconductor.org>]
Sent: 27 August 2014 13:11
To: bioconductor at r-project.org<mailto:bioconductor at r-project.org>; Scott Robinson
Cc: makecdfenv Maintainer
Subject: Using custom CDF with 'make.cdf.env'

Dear List,

I made a custom CDF by modifying the original Affymetrix miRNA v1 file. As there is a great level of redundancy in this chip I have condensed the original 7815 probe sets into 6190 probe sets (by 'moving' probes from one set to another), however when I try making and attaching my new CDF environment I still seem to have 7815 probe sets so presumably I must have done something wrong.

I have read the vignette and many similar posts to mine however still cannot work out what I am doing wrong. Perhaps the problem is with the CDF itself? I have a short script testing the functionality, the output of which I have copied in below. I will gladly attach the script, CDFs and example CEL file if there is nothing obviously wrong with the code - would do this now but there doesn't appear to be an option on the webform.

Many thanks,

Scott


> folder <- "C:\Work\COPD-ASTHMA\microRNA files\newCDF\test\"
>
> setwd(paste0(folder,"CEL"))
> options(stringsAsFactors=FALSE)
> library(affy)
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: ‘BiocGenerics’

The following objects are masked from ‘package:parallel’:

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following object is masked from ‘package:stats’:

    xtabs

The following objects are masked from ‘package:base’:

    anyDuplicated, as.data.frame, cbind, colnames, duplicated, eval,
    Filter, Find, get, intersect, lapply, Map, mapply, match, mget,
    order, paste, pmax, pmax.int<http://pmax.int>, pmin, pmin.int<http://pmin.int>, Position, rank,
    rbind, Reduce, rep.int<http://rep.int>, rownames, sapply, setdiff, sort, table,
    tapply, union, unique, unlist

Loading required package: Biobase
Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

> library(makecdfenv)
Loading required package: affyio
>
> cleancdfname("newmir1.cdf")
[1] "newmir1.cdf"
> newmir1 = make.cdf.env("newmir1.cdf")
Reading CDF file.
Creating CDF environment
Wait for about 78 dots.......................................................................
> Data <- ReadAffy()
> Data at cdfName <- "newmir1"
>
> Data
AffyBatch object
size of arrays=230x230 features (17 kb)
cdf=newmir1 (7815 affyids)
number of samples=1
number of genes=7815
annotation=mirna102xgain
notes=
>
> dim(exprs(rma(Data)))
Background correcting
Normalizing
Calculating Expression
[1] 7815    1


 -- output of sessionInfo():

> sessionInfo()
R version 3.0.2 (2013-09-25)
Platform: x86_64-w64-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=English_United Kingdom.1252
[2] LC_CTYPE=English_United Kingdom.1252
[3] LC_MONETARY=English_United Kingdom.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United Kingdom.1252

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods
[8] base

other attached packages:
[1] makecdfenv_1.36.0  affyio_1.28.0      affy_1.38.1        Biobase_2.20.1
[5] BiocGenerics_0.6.0

loaded via a namespace (and not attached):
[1] BiocInstaller_1.10.4  preprocessCore_1.22.0 tools_3.0.2
[4] zlibbioc_1.6.0

--
Sent via the guest posting facility at bioconductor.org<http://bioconductor.org>.



--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099



--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099

	[[alternative HTML version deleted]]



More information about the Bioconductor mailing list