[BioC] QuasR on Linux Cluster

Mon Oct 21 11:30:27 CEST 2013

Hi Ugo,

On 18.10.2013 13:56, Ugo Borello wrote:> Hi all,
> I am trying to use QuasR on a Linux Cluster:1 machine/multiple cores.
>
> I run:
> library(QuasR)
> library(BSgenome.Mmusculus.UCSC.mm10)
>
> cl <- makeCluster(1)
>
> sampleFile <- "sampleFile.txt"
>
> genomeName <- "BSgenome.Mmusculus.UCSC.mm10"
>
> proj <- qAlign(sampleFile, genome= genomeName, splicedAlignment=TRUE,
> clObj=cl)
>
> And I get
>> proj <- qAlign(sampleFile, genome= genomeName, splicedAlignment=TRUE,
> clObj=cl)
> alignment files missing - need to:
>     create 1 genomic alignment(s)
> Testing the compute nodes...OK
> Loading QuasR on the compute nodes...OK
> Available cores:
> nodeNames
> ccwsge0155
>          1
> Performing genomic alignments for 1 samples. See progress in the log file:
> /scratch/4401022.1.huge/QuasR_log_41394115a102.txt
> Error in unserialize(node$con) : error reading from connection
> Calls: qAlign ... FUN -> recvData -> recvData.SOCKnode -> unserialize
> Execution halted

The error that you get is not created within QuasR; my guess is that it
comes from the "parallel" package, indicating that something goes wrong
when using your cluster object "cl".

I would suggest testing whether your cluster object works fine. It would
help to know if the error message appears immediately after you call
qAlign(), or if it takes some time to process. Also, it would be great
to see the content of the QuasR log file.

Here is a simple test you could try to check your cluster object/connection:
parLapply(cl, seq_along(cl), function(i) Sys.info())

As a result, you should get Sys.info() output from each of the cluster
nodes.

> 
> I also tryied to modify the multicore option
> 
> cl <- makeCluster(detectCores())
> 
> And my job is killed because it uses more memory ( Max vmem = 17.118G) than
> allowed (16G)
With splicedAlignment=TRUE, QuasR will run spliceMap for aligning your
reads, which may require several GB of memory per node in your cluster
object. You can avoid the memory overflow by reducing the number of
nodes in your cluster object, e.g. by:

cl <- makeCluster(4)

which should run through on your machine with 16GB of memory.

Best,
Michael