[BioC] QuasR on Linux Cluster

Michael Stadler michael.stadler at fmi.ch
Mon Oct 21 17:48:53 CEST 2013


Your cluster object seems functional now.

Another possible problem could be available diskspace in R's tempdir().
It is used by qAlign to temporarily store the uncompressed fastq files,
the sam files and the bam files (and thus needs several-fold more free
capacity than the size of your fastq.gz files). For more information,
see vignette section 4.1 "File storage locations".

If tempdir() is too small, you can use redirect R's tempdir() by setting
the TMPDIR environment variable, or just for one qAlign call by using
the "cacheDir" parameter of qAlign.

If you are sure that diskspace is not the issue, could you give qAlign()
another try, using a cluster object with only 4 nodes to avoid any
memory issues?

Michael


On 21.10.2013 15:09, Ugo Borello wrote:
> Thank you Michael,
> My bad, I am not able to find the QuasR_log at the moment. Anyway the last
> step was the .sam file. QuasR was not proceeding in converting the .sam file
> to a .bam file.
> In attachment some other info on the running job before death.
> Those refer to a case where cl<- makeCluster(1).
> 
> 
> I run your test and I got:
>> library(parallel)
>> cl<- makeCluster(detectCores())
>> info<- parLapply(cl, seq_along(cl), function(i) Sys.info())
>> info
> [[1]]
>                              sysname                              release
>                              "Linux"                 "2.6.18-348.3.1.el5"
>                              version                             nodename
> "#1 SMP Tue Mar 5 13:19:32 EST 2013"                         "ccwsge0053"
>                              machine                                login
>                             "x86_64"                            "unknown"
>                                 user                       effective_user
>                           "uborello"                           "uborello"
> 
> The same for the 32 nodes.
> 
> Then I run:
>> library(parallel)
>> type <- if (exists("mcfork", mode="function")) "FORK" else "PSOCK"
>> type
> [1] "PSOCK"
>> cores <- getOption("mc.cores", detectCores())
>> cl <- makeCluster(cores, type=type)
>> cl
> socket cluster with 32 nodes on host 'localhost'
>> results <- parLapply(cl, 1:100, sqrt)
>> sum(unlist(results))
> [1] 671.4629
>> stopCluster(cl)
> 
> I don't know if this could help.
> 
> Any suggestions?
> 
> Ugo
> 
> 
> 
>> From: Michael Stadler <michael.stadler at fmi.ch>
>> Date: Mon, 21 Oct 2013 11:30:27 +0200
>> To: <bioconductor at r-project.org>
>> Subject: Re: [BioC] QuasR on Linux Cluster
>>
>> Hi Ugo,
>>
>> On 18.10.2013 13:56, Ugo Borello wrote:> Hi all,
>>> I am trying to use QuasR on a Linux Cluster:1 machine/multiple cores.
>>>
>>> I run:
>>> library(QuasR)
>>> library(BSgenome.Mmusculus.UCSC.mm10)
>>>
>>> cl <- makeCluster(1)
>>>
>>> sampleFile <- "sampleFile.txt"
>>>
>>> genomeName <- "BSgenome.Mmusculus.UCSC.mm10"
>>>
>>> proj <- qAlign(sampleFile, genome= genomeName, splicedAlignment=TRUE,
>>> clObj=cl)
>>>
>>> And I get
>>>> proj <- qAlign(sampleFile, genome= genomeName, splicedAlignment=TRUE,
>>> clObj=cl)
>>> alignment files missing - need to:
>>>     create 1 genomic alignment(s)
>>> Testing the compute nodes...OK
>>> Loading QuasR on the compute nodes...OK
>>> Available cores:
>>> nodeNames
>>> ccwsge0155
>>>          1
>>> Performing genomic alignments for 1 samples. See progress in the log file:
>>> /scratch/4401022.1.huge/QuasR_log_41394115a102.txt
>>> Error in unserialize(node$con) : error reading from connection
>>> Calls: qAlign ... FUN -> recvData -> recvData.SOCKnode -> unserialize
>>> Execution halted
>>
>> The error that you get is not created within QuasR; my guess is that it
>> comes from the "parallel" package, indicating that something goes wrong
>> when using your cluster object "cl".
>>
>> I would suggest testing whether your cluster object works fine. It would
>> help to know if the error message appears immediately after you call
>> qAlign(), or if it takes some time to process. Also, it would be great
>> to see the content of the QuasR log file.
>>
>> Here is a simple test you could try to check your cluster object/connection:
>> parLapply(cl, seq_along(cl), function(i) Sys.info())
>>
>> As a result, you should get Sys.info() output from each of the cluster
>> nodes.
>>
>>
>>>
>>> I also tryied to modify the multicore option
>>>
>>> cl <- makeCluster(detectCores())
>>>
>>> And my job is killed because it uses more memory ( Max vmem = 17.118G) than
>>> allowed (16G)
>> With splicedAlignment=TRUE, QuasR will run spliceMap for aligning your
>> reads, which may require several GB of memory per node in your cluster
>> object. You can avoid the memory overflow by reducing the number of
>> nodes in your cluster object, e.g. by:
>>
>> cl <- makeCluster(4)
>>
>> which should run through on your machine with 16GB of memory.
>>
>> Best,
>> Michael
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>



More information about the Bioconductor mailing list