[BioC] Can't normalize 300+ HuGene arrays in xps

Tue Aug 31 11:15:46 CEST 2010

Hi Christian,

Changing the name of the file into"dataRMA" didn't help. So I captured the beginning of the error message just after calculation of common mean finished. It says: 

SysError in <TFile::WriteBuffer>: error writing to file F:/Auswertung/GENEPI_combined/dataRMA.root (-1) (No space left on device)

So it seems to be an storage problem. However, there are still 95Gb free on my hard disk (F:). The drive, where I run ROOT and R has approx. 11Gb free disk space. I will try another drive with more than 200Gb space and see if the error still occurs.

Best, Mike

-----Ursprüngliche Nachricht-----
Von: cstrato <cstrato at aon.at>
Gesendet: 30.08.2010 21:37:44
An: Mike Walter <michael_walter at email.de>
Betreff: Re: [BioC] Can't normalize 300+ HuGene arrays in xps

>Dear Mike,
>
>First, I am glad to hear that the stepwise approach did finally work.
>
>Thank you also for sending me the screenshot which repeats the following 
>message many times:
>
>This error is symptomatic of a Tree created as a memory-resident Tree
>Instead of doing:
>    TTree *T = new TTree(...);
>    TFile *f = new TFile(...);
>you should do:
>    TFile *f = new TFile(...);
>    TTree *T = new TTree(...);
>
>Since I create always TFile first before creating new TTree(s) this 
>means that for some reason the connection to TFile got lost so that the 
>trees are kept in RAM. If you have only 6 trees this is no problem but 
>with 324 trees you get this error message. Sadly, the beginning of the 
>error messages are lost so that I do not know whether TFile was created 
>or not.
>
>Thus, at the moment I have no idea what might be the reason for this 
>problem and until now this error has never been reported.
>
>I would really appreciate if you could you try to run rma() with 
>'filename = "dataRMA"' instead of 'filename = "tmpdt_dataRMA"' and let 
>me know if the problem remains.
>
>Best regards
>Christian
>
>
>On 8/30/10 1:23 PM, Mike Walter wrote:
>> Dear Christian,
>>
>> Thanks for your help. To answer your questions first: I normally use RGui and my disk space was ~100Gb. I also tried the add.data=FALSE option, without success.
>>
>> So I did RMA normalization with 6 arrays in RTerm as you proposed. This worked fine. So I just tried to run RMA on all arrays on RTerm. Here, I got thousands of error messages after the "compution common mean" step was finished for all arrays. After approx. 20min of error messages scrolling over my screen windows ended R, so I couldn't copy any output. I made a screenshot, which is attached (although it might not make it into the BioC list).
>>
>> Therefore, I tried the stepwise approach in RTerm. To my great surprise, now everything worked fine. There was no error when I started the quantile normlization with the same code as before (except the verbose=TRUE). The median polish afterwards also worked. The output of RTerm is pasted below.
>>
>> So again, thank you very much for your help.
>>
>> Kind regards,
>>
>> Mike
>>
>>
>>> data.norm = normalize.quantiles(data.bkgd, filename = "quantile", filedir = $
>> + tmpdir = "", update = FALSE, exonlevel = exonlevel, verbose = TRUE)
>> Opening file<X:/affy/QC_Scripts/xps/schemes/Scheme_HuGene10stv1r4_na30_hg19.roo
>> t>  in<READ>  mode...
>> Creating new file<F:/Auswertung/GENEPI_combined/quantile.root>...
>> Opening file<F:/Auswertung/GENEPI_combined/bkgd_correct.root>  in<READ>  mode...
>>
>> Preprocessing data using method...
>>   Normalizing raw data...
>>   normalizing data using method...
>>   setting selector mask for typepm<9216>
>>   finished filling<324>  arrays.
>>   computing common mean...
>>   finished filling<324>  trees.
>>   preprocessing finished.
>>> save.image("F:/Auswertung/GENEPI_combined/GENEPI_all_stepwise.RData")
>>> data.mp = summarize.rma(data.norm, filename = "medianpolish", filedir = getw$
>> +   update = FALSE, option = "transcript", exonlevel = exonlevel, xps.scheme =$
>> Opening file<X:/affy/QC_Scripts/xps/schemes/Scheme_HuGene10stv1r4_na30_hg19.roo
>> t>  in<READ>  mode...
>> Creating new file<F:/Auswertung/GENEPI_combined/medianpolish.root>...
>> Opening file<F:/Auswertung/GENEPI_combined/quantile.root>  in<READ>  mode...
>> Preprocessing data using method...
>>   Converting raw data to expression levels...
>>   summarizing with<medianpolish>...
>>   setting selector mask for typepm<9216>
>>   setting selector mask for typepm<9216>
>>   calculating expression for<28829>  of<33664>  units...Finished.
>>   expression statistics:
>>   minimal expression level is<3.11771>
>>   maximal expression level is<20015.1>
>>   preprocessing finished.
>> Opening file<X:/affy/QC_Scripts/xps/schemes/Scheme_HuGene10stv1r4_na30_hg19.roo
>> t>  in<READ>  mode...
>> Opening file<F:/Auswertung/GENEPI_combined/medianpolish.root>  in<READ>  mode...
>>
>> Opening file<F:/Auswertung/GENEPI_combined/medianpolish.root>  in<READ>  mode...
>>
>> Exporting data from tree<*>  to file<F:/Auswertung/GENEPI_combined/medianpolish
>> .txt>...
>> Reading entries from<HuGene-1_0-st-v1.ann>  ...Finished
>> <28829>  of<28829>  records exported.
>>
>>
>>
>> -----Ursprüngliche Nachricht-----
>> Von: cstrato<cstrato at aon.at>
>> Gesendet: 27.08.2010 21:05:46
>> An: Mike Walter<michael_walter at email.de>
>> Betreff: Re: [BioC] Can't normalize 300+ HuGene arrays in xps
>>
>>> Dear Mike,
>>>
>>> In case that your problem turns out to be a memory-related problem, you
>>> can use rma(...,add.data=FALSE,..), which will prevent filling slot
>>> "data" with the expression levels. You can then import all normalized
>>> data or parts thereof using "export.expr()" or "root.expr()", as the
>>> help files show.
>>>
>>> Thus you could first run rma and then import the results in a separate step:
>>>
>>> ## rma
>>>> data.rma<- rma(data.xps, "tmpdt_dataRMA", background="antigenomic",
>>> normalize=T, exonlevel=exonlevel,  add.data=FALSE, verbose = TRUE)
>>>
>>> ## import subset of trees:
>>> ds<- export.expr(data.rma, treenames=c("name1.mdp","name3.mdp", etc),
>>> treetype="mdp", varlist="fUnitName:fSymbol:fLevel", outfile="tmp.txt",
>>> as.dataframe=TRUE)
>>>
>>> ## use subset of trees
>>>> sub.rma<- root.expr(scheme.test3, "tmpdt_dataRMA.root", "mdp",
>>> c("name1.mdp", "name2", etc))
>>>> str(sub.rma)
>>>
>>> Maybe after starting a new R-session, you are able to import all trees
>>> with "treenames='*'".
>>>
>>> Please let me know if this could solve your problem.
>>>
>>> Best regards
>>> Christian
>>>
>>>
>>> On 8/27/10 3:35 PM, Mike Walter wrote:
>>>> Hi all,
>>>>
>>>> I have a set of 324 HuGene 1.0 arrays I'd like to normalize all in one batch on a "normal" Windows computer. I allready normalized the arrays in two sets of 180 and 144 samples successfully with xps. When I apply the code below to put the samples all together, my R session just crashes.
>>>>
>>>> library(xps)
>>>> memory.limit(size=3000) # I modyfied my boot.ini to allow more memory. At least I hope it works.
>>>> exonlevel=rep((8192+1024),3)
>>>> scheme="Scheme_HuGene10stv1r4_na30_hg19.root"
>>>> gene.scheme<- root.scheme(paste("X:/affy/QC_Scripts/xps/schemes",scheme,sep="/"))
>>>> data.xps = root.data(gene.scheme, paste(getwd(),"Genepi_all_cel.root",sep="/"))
>>>> data.rma<- rma(data.xps, "tmpdt_dataRMA", background="antigenomic", normalize=T,
>>>>                       exonlevel=exonlevel, verbose = FALSE)
>>>>
>>>>
>>>> Thus, I tried to do the RMA stepwise. I succeeded in background correction, but get some error when trying to do the quantile normalization:
>>>>
>>>> data.bkgd = bgcorrect.rma(data.xps, filename = "bkgd_correct",
>>>>                       filedir = getwd(), tmpdir = "", update = FALSE,
>>>>                       select = "antigenomic", exonlevel = exonlevel, verbose = FALSE)
>>>>
>>>> data.norm = normalize.quantiles(data.bkgd, filename = "quantile", filedir = getwd(),
>>>>                        tmpdir = "", update = FALSE, exonlevel = exonlevel, verbose = FALSE)
>>>>
>>>> OR
>>>>
>>>>
>>>> data.norm = normalize(data.bkgd, "quantile", filedir=getwd(), tmpdir="",
>>>>                       method="quantile", select="pmonly", option="transcript:together:none",
>>>>                       logbase="0", params=c(0.0), exonlevel=exonlevel)
>>>>
>>>>
>>>> in both cases the output is "Fehler in .local(object, ...) : error in function ‘Normalize’". I guess it is only a wrong option somewhere. I also tried exonlevel="metacore+affx" with same result. Can anyone give me a hint, what might be missing?
>>>>
>>>> Thank you very much.
>>>>
>>>> Best,
>>>> Mike
>>>>
>>>>> sessionInfo()
>>>> R version 2.10.1 (2009-12-14)
>>>> i386-pc-mingw32
>>>>
>>>> locale:
>>>> [1] LC_COLLATE=German_Germany.1252  LC_CTYPE=German_Germany.1252
>>>> [3] LC_MONETARY=German_Germany.1252 LC_NUMERIC=C
>>>> [5] LC_TIME=German_Germany.1252
>>>>
>>>> attached base packages:
>>>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>>>
>>>> other attached packages:
>>>> [1] xps_1.6.4
>>>>
>>>> loaded via a namespace (and not attached):
>>>> [1] tools_2.10.1
>>>>
>>>>
>>>>
>>>>
>>> >