[BioC] Using custom CDF with 'make.cdf.env'

James W. MacDonald jmacdon at uw.edu
Wed Aug 27 17:51:53 CEST 2014


Hi Scott,

I see some of what you have done. As an example, you moved things around,
and changed the 'Cell' number:

C:\Users\BioinfAdmin\Desktop>grep -n bta-let-7a_st miRNA-1_0.CDF
129939:Name=bta-let-7a_st
129946:Cell1=185        178     ACTCCATCATCCAACATATCAA  control
bta-let-7a_st   0
129947:Cell2=197        180     ACTCCATCATCCAACATATCAA  control
bta-let-7a_st   1
129948:Cell3=83 156     ACTCCATCATCCAACATATCAA  control bta-let-7a_st   2
    11
129949:Cell4=210        187     ACTCCATCATCCAACATATCAA  control
bta-let-7a_st   3

C:\Users\BioinfAdmin\Desktop>grep -n bta-let-7a_st newmir1.cdf
43056:Cell5=185 178     ACTCCATCATCCAACATATCAA  control bta-let-7a_st   4
    11
43057:Cell6=197 180     ACTCCATCATCCAACATATCAA  control bta-let-7a_st   5
    11
43058:Cell7=83  156     ACTCCATCATCCAACATATCAA  control bta-let-7a_st   6
    11
43059:Cell8=210 187     ACTCCATCATCCAACATATCAA  control bta-let-7a_st   7
    11


This won't change anything. In both cases, there is a probeset called
bta-let-7a_st, that has four identical probes. Putting these data somewhere
else in the cdf won't change the way it is parsed.

In other words, this:

C:\Users\BioinfAdmin\Desktop> sed -n '43050,43111p' newmir1.cdf
StopPosition=59
CellHeader=X    Y       PROBE   FEAT    QUAL    EXPOS   POS     CBASE
PBASE   TBA
Cell1=2 190     ACTCCATCATCCAACATATCAA  control hsa-let-7a_st   0       11
     G
Cell2=196       180     ACTCCATCATCCAACATATCAA  control hsa-let-7a_st   1
    11
Cell3=211       187     ACTCCATCATCCAACATATCAA  control hsa-let-7a_st   2
    11
Cell4=29        205     ACTCCATCATCCAACATATCAA  control hsa-let-7a_st   3
    11
Cell5=185       178     ACTCCATCATCCAACATATCAA  control bta-let-7a_st   4
    11
Cell6=197       180     ACTCCATCATCCAACATATCAA  control bta-let-7a_st   5
    11
Cell7=83        156     ACTCCATCATCCAACATATCAA  control bta-let-7a_st   6
    11
Cell8=210       187     ACTCCATCATCCAACATATCAA  control bta-let-7a_st   7
    11
Cell9=2 189     ACTCCATCATCCAACATATCAA  control cbr-let-7_st    8       11
     G
Cell10=178      178     ACTCCATCATCCAACATATCAA  control cbr-let-7_st    9
    11
Cell11=212      189     ACTCCATCATCCAACATATCAA  control cbr-let-7_st    10
     11
Cell12=189      181     ACTCCATCATCCAACATATCAA  control cbr-let-7_st    11
     11
Cell13=179      178     ACTCCATCATCCAACATATCAA  control cel-let-7_st    12
     11
Cell14=80       157     ACTCCATCATCCAACATATCAA  control cel-let-7_st    13
     11
Cell15=215      191     ACTCCATCATCCAACATATCAA  control cel-let-7_st    14
     11
Cell16=190      181     ACTCCATCATCCAACATATCAA  control cel-let-7_st    15
     11
Cell17=79       157     ACTCCATCATCCAACATATCAA  control cfa-let-7a_st   16
     11
Cell18=213      189     ACTCCATCATCCAACATATCAA  control cfa-let-7a_st   17
     11
Cell19=182      179     ACTCCATCATCCAACATATCAA  control cfa-let-7a_st   18
     11
Cell20=196      181     ACTCCATCATCCAACATATCAA  control cfa-let-7a_st   19
     11
Cell21=205      184     ACTCCATCATCCAACATATCAA  control dre-let-7a_st   20
     11
Cell22=188      181     ACTCCATCATCCAACATATCAA  control dre-let-7a_st   21
     11
Cell23=216      191     ACTCCATCATCCAACATATCAA  control dre-let-7a_st   22
     11
Cell24=83       157     ACTCCATCATCCAACATATCAA  control dre-let-7a_st   23
     11
Cell25=77       157     ACTCCATCATCCAACATATCAA  control fru-let-7a_st   24
     11
Cell26=212      188     ACTCCATCATCCAACATATCAA  control fru-let-7a_st   25
     11
Cell27=193      181     ACTCCATCATCCAACATATCAA  control fru-let-7a_st   26
     11
Cell28=182      180     ACTCCATCATCCAACATATCAA  control fru-let-7a_st   27
     11
Cell29=188      180     ACTCCATCATCCAACATATCAA  control gga-let-7a_st   28
     11
Cell30=211      189     ACTCCATCATCCAACATATCAA  control gga-let-7a_st   29
     11
Cell31=78       157     ACTCCATCATCCAACATATCAA  control gga-let-7a_st   30
     11
Cell32=199      180     ACTCCATCATCCAACATATCAA  control gga-let-7a_st   31
     11
Cell33=214      188     ACTCCATCATCCAACATATCAA  control gga-let-7j_st   32
     11
Cell34=191      181     ACTCCATCATCCAACATATCAA  control gga-let-7j_st   33
     11
Cell35=180      177     ACTCCATCATCCAACATATCAA  control gga-let-7j_st   34
     11
Cell36=203      180     ACTCCATCATCCAACATATCAA  control gga-let-7j_st   35
     11
Cell37=211      188     ACTCCATCATCCAACATATCAA  control mdo-let-7a_st   36
     11
Cell38=184      179     ACTCCATCATCCAACATATCAA  control mdo-let-7a_st   37
     11
Cell39=195      181     ACTCCATCATCCAACATATCAA  control mdo-let-7a_st   38
     11
Cell40=82       157     ACTCCATCATCCAACATATCAA  control mdo-let-7a_st   39
     11
Cell41=179      177     ACTCCATCATCCAACATATCAA  control mml-let-7a_st   40
     11
Cell42=190      182     ACTCCATCATCCAACATATCAA  control mml-let-7a_st   41
     11
Cell43=214      191     ACTCCATCATCCAACATATCAA  control mml-let-7a_st   42
     11
Cell44=202      180     ACTCCATCATCCAACATATCAA  control mml-let-7a_st   43
     11
Cell45=183      179     ACTCCATCATCCAACATATCAA  control mmu-let-7a_st   44
     11
Cell46=84       157     ACTCCATCATCCAACATATCAA  control mmu-let-7a_st   45
     11
Cell47=194      181     ACTCCATCATCCAACATATCAA  control mmu-let-7a_st   46
     11
Cell48=212      187     ACTCCATCATCCAACATATCAA  control mmu-let-7a_st   47
     11
Cell49=76       157     ACTCCATCATCCAACATATCAA  control rno-let-7a_st   48
     11
Cell50=192      181     ACTCCATCATCCAACATATCAA  control rno-let-7a_st   49
     11
Cell51=181      177     ACTCCATCATCCAACATATCAA  control rno-let-7a_st   50
     11
Cell52=212      191     ACTCCATCATCCAACATATCAA  control rno-let-7a_st   51
     11
Cell53=187      181     ACTCCATCATCCAACATATCAA  control tni-let-7a_st   52
     11
Cell54=128      77      ACTCCATCATCCAACATATCAA  control tni-let-7a_st   53
     11
Cell55=81       157     ACTCCATCATCCAACATATCAA  control tni-let-7a_st   54
     11
Cell56=213      191     ACTCCATCATCCAACATATCAA  control tni-let-7a_st   55
     11
Cell57=214      189     ACTCCATCATCCAACATATCAA  control xtr-let-7a_st   56
     11
Cell58=185      179     ACTCCATCATCCAACATATCAA  control xtr-let-7a_st   57
     11
Cell59=22       202     ACTCCATCATCCAACATATCAA  control xtr-let-7a_st   58
     11
Cell60=197      181     ACTCCATCATCCAACATATCAA  control xtr-let-7a_st   59
     11

will not create a single probeset for let-7a, over all species. And trying
to combine 60 identical 25-mers into a single probeset is about as useless
as having 15 individual probesets made up of four identical probes. You are
still running RMA (or whatever) on essentially the same information, with
the only differences between probes being entirely due to technical
variability. These arrays are, within the constraints of Affy's system,
about as good as you can do. Which is to say, not very good.

If you really want to do what you want to do, then you have to also make
the probeset IDs identical within each block. So here you would have to
strip off the prepended species abbreviation, and convert the gga-let-7j
probes to let-7a_st, and then you would have just one probeset. But that
will be a lot of work for what I imagine will be very little gain.

Best,

Jim




On Wed, Aug 27, 2014 at 11:19 AM, James W. MacDonald <jmacdon at uw.edu> wrote:

> Hi Scott,
>
> As far as I can tell, you haven't made any changes to the cdf at all:
>
> > z <- make.cdf.env("newmir1.cdf")
> Reading CDF file.
> Creating CDF environment
> Wait for about 78
> dots.........................................................................
> > z
> <environment: 0x00000000113d5c08>
> > length(ls(z))
> [1] 7815
> > zz <- as.list(z)
> > table(sapply(zz, nrow))
>
>    4    8    9   10   11   20   25   40   50   67   73   88   89   90   91
>   92   94
> 6703    8   14   32  959    9    1    1    2    1    1    1    2    1    1
>    1   78
> > y <- make.cdf.env("miRNA-1_0.CDF")
> Reading CDF file.
> Creating CDF environment
> Wait for about 78
> dots..........................................................................
> > yy <- as.list(y)
> > length(yy)
> [1] 7815
> > table(sapply(yy, nrow))
>
>    4    8    9   10   11   20   25   40   50   67   73   88   89   90   91
>   92   94
> 6703    8   14   32  959    9    1    1    2    1    1    1    2    1    1
>    1   78
> > all.equal(names(zz), names(yy))
> [1] TRUE
>
> Best,
>
> Jim
>
>
>
>
> On Wed, Aug 27, 2014 at 10:31 AM, Scott Robinson <
> Scott.Robinson at glasgow.ac.uk> wrote:
>
>> Dear All,
>>
>> Since it exceeds 1MB, here is a link to the old ("miRNA-1_0.CDF") and new
>> ("newmir1.cdf") CDFs, test script and example CEL file:
>>
>> http://www.files.com/set/53fdeb0aa2176
>>
>> Thanks,
>>
>> Scott
>> ________________________________________
>> From: Scott Robinson [guest] [guest at bioconductor.org]
>> Sent: 27 August 2014 13:11
>> To: bioconductor at r-project.org; Scott Robinson
>> Cc: makecdfenv Maintainer
>> Subject: Using custom CDF with 'make.cdf.env'
>>
>> Dear List,
>>
>> I made a custom CDF by modifying the original Affymetrix miRNA v1 file.
>> As there is a great level of redundancy in this chip I have condensed the
>> original 7815 probe sets into 6190 probe sets (by 'moving' probes from one
>> set to another), however when I try making and attaching my new CDF
>> environment I still seem to have 7815 probe sets so presumably I must have
>> done something wrong.
>>
>> I have read the vignette and many similar posts to mine however still
>> cannot work out what I am doing wrong. Perhaps the problem is with the CDF
>> itself? I have a short script testing the functionality, the output of
>> which I have copied in below. I will gladly attach the script, CDFs and
>> example CEL file if there is nothing obviously wrong with the code - would
>> do this now but there doesn't appear to be an option on the webform.
>>
>> Many thanks,
>>
>> Scott
>>
>>
>> > folder <- "C:\Work\COPD-ASTHMA\microRNA files\newCDF\test\"
>> >
>> > setwd(paste0(folder,"CEL"))
>> > options(stringsAsFactors=FALSE)
>> > library(affy)
>> Loading required package: BiocGenerics
>> Loading required package: parallel
>>
>> Attaching package: ‘BiocGenerics’
>>
>> The following objects are masked from ‘package:parallel’:
>>
>>     clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
>>     clusterExport, clusterMap, parApply, parCapply, parLapply,
>>     parLapplyLB, parRapply, parSapply, parSapplyLB
>>
>> The following object is masked from ‘package:stats’:
>>
>>     xtabs
>>
>> The following objects are masked from ‘package:base’:
>>
>>     anyDuplicated, as.data.frame, cbind, colnames, duplicated, eval,
>>     Filter, Find, get, intersect, lapply, Map, mapply, match, mget,
>>     order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
>>     rbind, Reduce, rep.int, rownames, sapply, setdiff, sort, table,
>>     tapply, union, unique, unlist
>>
>> Loading required package: Biobase
>> Welcome to Bioconductor
>>
>>     Vignettes contain introductory material; view with
>>     'browseVignettes()'. To cite Bioconductor, see
>>     'citation("Biobase")', and for packages 'citation("pkgname")'.
>>
>> > library(makecdfenv)
>> Loading required package: affyio
>> >
>> > cleancdfname("newmir1.cdf")
>> [1] "newmir1.cdf"
>> > newmir1 = make.cdf.env("newmir1.cdf")
>> Reading CDF file.
>> Creating CDF environment
>> Wait for about 78
>> dots.......................................................................
>> > Data <- ReadAffy()
>> > Data at cdfName <- "newmir1"
>> >
>> > Data
>> AffyBatch object
>> size of arrays=230x230 features (17 kb)
>> cdf=newmir1 (7815 affyids)
>> number of samples=1
>> number of genes=7815
>> annotation=mirna102xgain
>> notes=
>> >
>> > dim(exprs(rma(Data)))
>> Background correcting
>> Normalizing
>> Calculating Expression
>> [1] 7815    1
>>
>>
>>  -- output of sessionInfo():
>>
>> > sessionInfo()
>> R version 3.0.2 (2013-09-25)
>> Platform: x86_64-w64-mingw32/x64 (64-bit)
>>
>> locale:
>> [1] LC_COLLATE=English_United Kingdom.1252
>> [2] LC_CTYPE=English_United Kingdom.1252
>> [3] LC_MONETARY=English_United Kingdom.1252
>> [4] LC_NUMERIC=C
>> [5] LC_TIME=English_United Kingdom.1252
>>
>> attached base packages:
>> [1] parallel  stats     graphics  grDevices utils     datasets  methods
>> [8] base
>>
>> other attached packages:
>> [1] makecdfenv_1.36.0  affyio_1.28.0      affy_1.38.1
>> Biobase_2.20.1
>> [5] BiocGenerics_0.6.0
>>
>> loaded via a namespace (and not attached):
>> [1] BiocInstaller_1.10.4  preprocessCore_1.22.0 tools_3.0.2
>> [4] zlibbioc_1.6.0
>>
>> --
>> Sent via the guest posting facility at bioconductor.org.
>>
>
>
>
> --
> James W. MacDonald, M.S.
> Biostatistician
> University of Washington
> Environmental and Occupational Health Sciences
> 4225 Roosevelt Way NE, # 100
> Seattle WA 98105-6099
>



-- 
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099

	[[alternative HTML version deleted]]



More information about the Bioconductor mailing list