[BioC] Affy Tiling arrays & oligo

Henrik Bengtsson hb at stat.berkeley.edu
Wed Dec 12 21:03:21 CET 2007


Hi,

just a comment.  I had a quick look at
makePlatformDesign::makePDpackage(), and more specifically the
function makeBPMAPenv() that it calls, and after read all of the data
in, there is a restructuring of the data which uses nested rbind() and
cbind() calls:

    bpmap <- .Call("ReadBPMAPFileIntoRList", bpmapFile, PACKAGE = "affyio")
    n <- length(bpmap$SequenceDescription)
    tmp <- do.call("rbind", lapply(1:n, function(obj) {
        tmp2 <- cbind(bpmap$SeqHead.PosInfo[[obj]]$PositionInformation,
            bpmap$SequenceDescription[[obj]]$Name)
        n2 <- nrow(tmp2)
        if (bpmap$Header$version >= 3)
            if (bpmap$SequenceDescription[[obj]]$ProbeMappingType == 1)
                tmp2 <- cbind(tmp2[, 1:2], x.mm = NA, y.mm = NA,
tmp2[, -c(1:2)])
        return(tmp2)
    }))
    names(tmp)[10] <- "chromosome"
    rm(bpmap)

Those few lines of code are likely to eat up a lot of the memory, and
rewriting it to pre-allocate the result structure and using a for-loop
should help.

Also, which is also possible in the current implementation, by setting
the source elements in the 'bpmap' list to NA as soon as they are
extracted should also help the garbage collector to free some memory,
e.g.

    tmp <- do.call("rbind", lapply(1:n, function(obj) {
        probeMappingType <- bpmap$SequenceDescription[[obj]]$ProbeMappingType;
        name <- bpmap$SequenceDescription[[obj]]$Name;
        bpmap$SequenceDescription[[obj]] <<- NA;
        positionInformation <- bpmap$SeqHead.PosInfo[[obj]]$PositionInformation;
        bpmap$SeqHead.PosInfo[[obj]] <<- NA;
        tmp2 <- cbind(positionInformation, name);
        rm(positionInformation, name);
        n2 <- nrow(tmp2);
        if (bpmap$Header$version >= 3)
            if (probeMappingType == 1)
                tmp2 <- cbind(tmp2[, 1:2], x.mm = NA, y.mm = NA,
tmp2[, -c(1:2)]);
        return(tmp2);
    }))

I haven't tried the above, but ideally you would do it in a for-loop
so that you can assign the NA:s without having to utilize the "ad hoc"
'<<-' operator.

Moreover, throwing in a gc() at the end of the internal function (and
possibly also also after the lapply()) could help.

Finally, you can also save a copy or two by splitting up the
rbind()/lapply() structure as:

    tmp <- lapply(1:n, function(obj) { ... })
    rm(bpmap);
    gc();
    tmp <- do.call("rbind", tmp);
    gc();

That's my $0.02 (I'm keen to hear how much the memory overhead goes
down, if anyone tries this).

/Henrik

On 12/12/2007, James W. MacDonald <jmacdon at med.umich.edu> wrote:
> Hi Joseph,
>
> You might be able to analyze your data on a Windows box, but I don't
> think you will be able to create the platform design package, which is
> going to take lots of memory. Probably your best bet is to use a 64 bit
> Linux box with 6-8 Gb RAM.
>
> Best,
>
> Jim
>
>
>
> joseph wrote:
> > Hi Benilton
> > I am using
> >     the Mouse Promoter 1.0R Array.
> > So far I have not been sucessuful to create the platform design environment.  Please see the error and the sessionInfo() below. Also, can you please point out to me Naira's document you mentioned, or you are referring to her last month entry "Oligo package and tiling arrays - PM and MM" .
> > Joseph
> >
> >
> >
> >
> > makePDpackage("Mm_PromPR_v01_NCBIv35.bpmap",manufacturer="affymetrix",
> > type="tiling")
> >
> >
> > affymetrix tiling
> >
> >
> > The package will be called pd.mm.prompr.v01.ncbiv35
> >
> >
> > Error: cannot allocate vector of size 16.1 Mb
> >
> >
> > In addition: Warning messages:
> >
> >
> > 1: In names(tmp)[10] <- "chromosome" :
> >
> >
> >   Reached total
> > allocation of 1535Mb: see help(memory.size)
> >
> >
> > 2: In names(tmp)[10] <- "chromosome" :
> >
> >
> >   Reached total
> > allocation of 1535Mb: see help(memory.size)
> >
> >
> > 3: In names(tmp)[10] <- "chromosome" :
> >
> >
> >   Reached total
> > allocation of 1535Mb: see help(memory.size)
> >
> >
> > 4: In names(tmp)[10] <- "chromosome" :
> >
> >
> >   Reached total
> > allocation of 1535Mb: see help(memory.size)
> >
> >
> >> sessionInfo()
> >
> >
> > R version 2.6.1 Patched (2007-12-08 r43617)
> >
> >
> > i386-pc-mingw32
> >
> >
> > locale:
> >
> >
> > LC_COLLATE=English_United
> > States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United
> > States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
> >
> >
> > attached base packages:
> >
> >
> > [1] splines
> > tools     stats     graphics
> > grDevices utils     datasets  methods
> > base
> >
> >
> > other attached packages:
> >
> >
> >  [1]
> > makePlatformDesign_1.2.0 oligo_1.2.2              oligoClasses_1.0.3       affxparser_1.10.2
> >
> >
> >  [5] AnnotationDbi_1.0.6      preprocessCore_1.0.0     RSQLite_0.6-4            DBI_0.2-4
> >
> >
> >  [9] Biobase_1.16.1           affyio_1.6.1
> > ----- Original Message ----
> > From: Benilton Carvalho <bcarvalh at jhsph.edu>
> > To: joseph <jdsandjd at yahoo.com>
> > Cc: bioconductor at stat.math.ethz.ch
> > Sent: Wednesday, December 12, 2007 8:31:38 AM
> > Subject: Re: Affy Tiling arrays & oligo
> >
> >
> > Dear Joseph,
> >
> > As of now, oligo can read in the intensities for your CEL files, if you create the platform design environment as described by Naira. I don't have specific methods implemented yet in oligo for tiling arrays, but you can benefit if other packages do and use standard Biobase objects.
> >
> >
> > What array are you using?
> >
> >
> > b
> >
> > On Dec 12, 2007, at 11:25 AM, joseph wrote:
> >
> > Hello
> > The vignette "An Introduction to the Oligo Package" addresses only Affymetrix SNP Arrays. Does any one have an example of how to analyze Affy tiling arrays with oligo and is willing to share it?
> > Thanks
> > Joseph Dhahbi
> > CHORI
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >       ____________________________________________________________________________________
> > Be a better friend, newshound, and
> >
> >
> >       [[alternative HTML version deleted]]
> >
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor at stat.math.ethz.ch
> > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
> --
> James W. MacDonald, M.S.
> Biostatistician
> Affymetrix and cDNA Microarray Core
> University of Michigan Cancer Center
> 1500 E. Medical Center Drive
> 7410 CCGC
> Ann Arbor MI 48109
> 734-647-5623
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>



More information about the Bioconductor mailing list