[BioC] bout big data set for Affy R packge

Steve Piccolo stephen.piccolo at hsc.utah.edu
Sat Dec 22 16:52:43 CET 2012


Wei,

I'm assuming your end goal is to normalize the files?

If so, there are a few other options you could try for a large number of
CEL files. You could process the CEL files in smaller groups.
Alternatively (and in my opinion, a better approach), you could use our
SCAN.UPC package (or the frma package), which are designed to normalize
one file at a time. That way you only need enough memory to process one
file at a time.

Regards,
-Steve


On 12/22/2011 Sat, Dec 22, 2011 4:00 AM,
"bioconductor-request at r-project.org" <bioconductor-request at r-project.org>
wrote:

>
>
>------------------------------
>
>Message: 10
>Date: Sat, 22 Dec 2012 15:31:51 +1100
>From: Rob Dunne <Rob.Dunne at csiro.au>
>To: Benilton Carvalho <beniltoncarvalho at gmail.com>
>Cc: "bioconductor at r-project.org" <bioconductor at r-project.org>
>Subject: Re: [BioC] bout big data set for Affy R packge
>Message-ID: <50D537B7.700 at csiro.au>
>Content-Type: text/plain; charset="UTF-8"; format=flowed
>
>Hi Benilton,
>
>Unless I am missing something, ff wont help in this case. From the ff
>help page
>
>"Currently ff objects cannot have length zero and are limited to
>?.Machine$integer.max? elements"
>
>and .Machine$integer.max is  2^(31)-1. This is exceeded when you try to
>load 328 Affy exon arrays hence
>
>library(ff)
>library(oligo)
>data<-read.celfiles(filenames=files)
>#Loading required package: pd.huex.1.0.st.v2
>#Loading required package: RSQLite
>#Loading required package: DBI
>#Platform design info loaded.
>#Error in if (length < 0 || length > .Machine$integer.max) stop("length
>must be between 1 and .Machine$integer.max") :
>#  missing value where TRUE/FALSE needed
>#In addition: Warning message:
>#In ff(initdata = initdata, vmode = vmode, dim = dim, pattern =
>file.path(ldPath(),  :
>#  NAs introduced by coercion
>
>  traceback()
>#4: ff(initdata = initdata, vmode = vmode, dim = dim, pattern =
>file.path(ldPath(),
>#       basename(name)))
>#3: createFF("intensities-", dim = c(nr, length(filenames)))
>#2: smartReadCEL(filenames, sampleNames, headdetails = headdetails)
>#1: read.celfiles(filenames = ff)
>
>This is why I went done the path of modifying read.celfiles to use
>big.matrix, which does not have the  2^(31)-1
>limit
>
>Bye
>Rob
>
>
>
>
>
>
>sessionInfo()
>#R version 2.15.0 (2012-03-30)
>#Platform: x86_64-unknown-linux-gnu (64-bit)
>#
>#locale:
># [1] LC_CTYPE=en_AU.UTF-8       LC_NUMERIC=C
># [3] LC_TIME=en_AU.UTF-8        LC_COLLATE=en_AU.UTF-8
># [5] LC_MONETARY=en_AU.UTF-8    LC_MESSAGES=en_AU.UTF-8
># [7] LC_PAPER=C                 LC_NAME=C
># [9] LC_ADDRESS=C               LC_TELEPHONE=C
>#[11] LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C
>#
>#attached base packages:
>#[1] tools     stats     graphics  grDevices utils     datasets methods
>#[8] base
>#
>#other attached packages:
>#[1] pd.huex.1.0.st.v2_3.6.0 RSQLite_0.11.2 DBI_0.2-5
>#[4] oligo_1.20.4            oligoClasses_1.18.0 ff_2.2-10
>#[7] bit_1.1-9
>#
>#loaded via a namespace (and not attached):
># [1] affxparser_1.28.1     affyio_1.24.0 Biobase_2.16.0
># [4] BiocGenerics_0.2.0    BiocInstaller_1.4.9 Biostrings_2.24.1
># [7] codetools_0.2-8       compiler_2.15.0 foreach_1.4.0
>#[10] IRanges_1.14.4        iterators_1.0.6 preprocessCore_1.18.0
>#[13] splines_2.15.0        stats4_2.15.0 zlibbioc_1.2.0
>
>
>On 12/21/2012 10:45 PM, Benilton Carvalho wrote:
>> Hi Rob,
>>
>> looks like you're running an old version of oligo.
>>
>> Today, our approach is:
>>
>> library(ff)
>> library(oligo)
>> my.data <- read.celfiles(<CEL file names>)
>>
>> HTH,
>> b
>>
>> On 21 December 2012 01:02, Rob Dunne <Rob.Dunne at csiro.au> wrote:
>>> Hi Wei Liu,
>>>
>>> if they are affymetrix 1.0 ST exon arrays, I can send you a modified
>>>version of read.celfiles from the oligo package that
>>> should read a 300 microarray data set. I dont know it it will work for
>>>other array types, possibly not without some work.
>>>   It is a modified version of the read.celfiles that uses the
>>>big.matrix class from the big.memory package
>>>
>>> my.data<-read.celfiles(filenames=ff,useAffyio=FALSE)
>>> my. data
>>> #assayData: 6553600 features, 335 samples
>>> #Annotation: pd.huex.1.0.st.v2
>>>
>>> Bye
>>> Rob
>>>
>>>
>>>
>>>
>>> On 12/20/2012 01:21 AM, ?? wrote:
>>>> Dear Buddy,
>>>> I am a user of affy R package. When I attempt to handle a large
>>>> number (aprox. 300) of microarrays, I always get an error in memory
>>>> allocation from R. I searched the web but didnot find any solution for
>>>> readaffy() with large dataset. I donnot know if the problem can be
>>>> fixed in some way. Any suggestion is appreciated. Thanks.
>>>>
>>>> Sincerely,
>>>> Wei Liu
>>>>
>>>>        [[alternative HTML version deleted]]
>>>>
>>>> _______________________________________________
>>>> Bioconductor mailing list
>>>> Bioconductor at r-project.org
>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>> Search the archives:
>>>>http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>>> --
>>> -
>>> Rob Dunne         Fax: +61 2 9325 3200     Tel: +61 2 9325 3263
>>> CSIRO Mathematics, Informatics and Statistics   +61 2 9325 3100
>>> Locked Bag 17, North Ryde, New South Wales, Australia, 1670
>>> http://www.bioinformatics.csiro.au Email: Rob.Dunne at csiro.au
>>>
>>>          Java has certainly revolutionized marketing and litigation.
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>>http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
>-- 
>-
>Rob Dunne         Fax: +61 2 9325 3200     Tel: +61 2 9325 3263
>CSIRO Mathematics, Informatics and Statistics   +61 2 9325 3100
>Locked Bag 17, North Ryde, New South Wales, Australia, 1670
>http://www.bioinformatics.csiro.au Email: Rob.Dunne at csiro.au
>
>         Java has certainly revolutionized marketing and litigation.
>



More information about the Bioconductor mailing list