[BioC] QuasR on Linux Cluster

Michael Stadler michael.stadler at fmi.ch
Tue Oct 22 11:50:05 CEST 2013


I can see from the intermediate files that SpliceMap was stopped halfway
through, before it could create the single sam file with spliced alignments.

QuasR tries to detect such cases in the child R process (one of the R
processes spawned in your cluster object) and throws an error with a
descriptive message. However, you do not get this error message. Rather,
you get an error indicating that the parent R process lost it's
connection to the child R process.

It's hard to get at this from far, so I'll have to wildly guess. Could
it be that the child R process is terminated and therefore neither able
to signal failure, nor to communicate with the parent R process? Can you
give more details about your setup, e.g. if you are running some batch
or queueing system that controls job execution?

Other things that may help to narrow down the problem is to rerun
qAlign() on a subset of the dataset, or without a cluster object. It may
also help to know a bit more about the sample you try to analyse (read
length, read number, sequence file format).

Michael




On 22.10.2013 10:43, Ugo Borello wrote:
> Dear Michael,
> I think that the disk space is not an issue; anyway, I will double check
> with the administrator.
> 
> I used 4 nodes and QuasR stopped at the .sam file. See the output files in
> attachment.
> 
> When I use less than 4 nodes, it stops at the beginning of the process:
> 
> [1] "Writing BSgenome to disk on ccwsge0144 :
> /scratch/4847271.1.huge/Rtmp7nHkpp/file5971727e49b5.fa"
> 
> 
> 
> What am I missing?
> 
> Thank you
> 
> Ugo
> 
> 
> 
>> From: Michael Stadler <michael.stadler at fmi.ch>
>> Date: Mon, 21 Oct 2013 17:48:53 +0200
>> To: Ugo Borello <ugo.borello at inserm.fr>, <bioconductor at r-project.org>
>> Subject: Re: [BioC] QuasR on Linux Cluster
>>
>> Your cluster object seems functional now.
>>
>> Another possible problem could be available diskspace in R's tempdir().
>> It is used by qAlign to temporarily store the uncompressed fastq files,
>> the sam files and the bam files (and thus needs several-fold more free
>> capacity than the size of your fastq.gz files). For more information,
>> see vignette section 4.1 "File storage locations".
>>
>> If tempdir() is too small, you can use redirect R's tempdir() by setting
>> the TMPDIR environment variable, or just for one qAlign call by using
>> the "cacheDir" parameter of qAlign.
>>
>> If you are sure that diskspace is not the issue, could you give qAlign()
>> another try, using a cluster object with only 4 nodes to avoid any
>> memory issues?
>>
>> Michael
>>
>>
>> On 21.10.2013 15:09, Ugo Borello wrote:
>>> Thank you Michael,
>>> My bad, I am not able to find the QuasR_log at the moment. Anyway the last
>>> step was the .sam file. QuasR was not proceeding in converting the .sam file
>>> to a .bam file.
>>> In attachment some other info on the running job before death.
>>> Those refer to a case where cl<- makeCluster(1).
>>>
>>>
>>> I run your test and I got:
>>>> library(parallel)
>>>> cl<- makeCluster(detectCores())
>>>> info<- parLapply(cl, seq_along(cl), function(i) Sys.info())
>>>> info
>>> [[1]]
>>>                              sysname                              release
>>>                              "Linux"                 "2.6.18-348.3.1.el5"
>>>                              version                             nodename
>>> "#1 SMP Tue Mar 5 13:19:32 EST 2013"                         "ccwsge0053"
>>>                              machine                                login
>>>                             "x86_64"                            "unknown"
>>>                                 user                       effective_user
>>>                           "uborello"                           "uborello"
>>>
>>> The same for the 32 nodes.
>>>
>>> Then I run:
>>>> library(parallel)
>>>> type <- if (exists("mcfork", mode="function")) "FORK" else "PSOCK"
>>>> type
>>> [1] "PSOCK"
>>>> cores <- getOption("mc.cores", detectCores())
>>>> cl <- makeCluster(cores, type=type)
>>>> cl
>>> socket cluster with 32 nodes on host 'localhost'
>>>> results <- parLapply(cl, 1:100, sqrt)
>>>> sum(unlist(results))
>>> [1] 671.4629
>>>> stopCluster(cl)
>>>
>>> I don't know if this could help.
>>>
>>> Any suggestions?
>>>
>>> Ugo
>>>
>>>
>>>
>>>> From: Michael Stadler <michael.stadler at fmi.ch>
>>>> Date: Mon, 21 Oct 2013 11:30:27 +0200
>>>> To: <bioconductor at r-project.org>
>>>> Subject: Re: [BioC] QuasR on Linux Cluster
>>>>
>>>> Hi Ugo,
>>>>
>>>> On 18.10.2013 13:56, Ugo Borello wrote:> Hi all,
>>>>> I am trying to use QuasR on a Linux Cluster:1 machine/multiple cores.
>>>>>
>>>>> I run:
>>>>> library(QuasR)
>>>>> library(BSgenome.Mmusculus.UCSC.mm10)
>>>>>
>>>>> cl <- makeCluster(1)
>>>>>
>>>>> sampleFile <- "sampleFile.txt"
>>>>>
>>>>> genomeName <- "BSgenome.Mmusculus.UCSC.mm10"
>>>>>
>>>>> proj <- qAlign(sampleFile, genome= genomeName, splicedAlignment=TRUE,
>>>>> clObj=cl)
>>>>>
>>>>> And I get
>>>>>> proj <- qAlign(sampleFile, genome= genomeName, splicedAlignment=TRUE,
>>>>> clObj=cl)
>>>>> alignment files missing - need to:
>>>>>     create 1 genomic alignment(s)
>>>>> Testing the compute nodes...OK
>>>>> Loading QuasR on the compute nodes...OK
>>>>> Available cores:
>>>>> nodeNames
>>>>> ccwsge0155
>>>>>          1
>>>>> Performing genomic alignments for 1 samples. See progress in the log file:
>>>>> /scratch/4401022.1.huge/QuasR_log_41394115a102.txt
>>>>> Error in unserialize(node$con) : error reading from connection
>>>>> Calls: qAlign ... FUN -> recvData -> recvData.SOCKnode -> unserialize
>>>>> Execution halted
>>>>
>>>> The error that you get is not created within QuasR; my guess is that it
>>>> comes from the "parallel" package, indicating that something goes wrong
>>>> when using your cluster object "cl".
>>>>
>>>> I would suggest testing whether your cluster object works fine. It would
>>>> help to know if the error message appears immediately after you call
>>>> qAlign(), or if it takes some time to process. Also, it would be great
>>>> to see the content of the QuasR log file.
>>>>
>>>> Here is a simple test you could try to check your cluster object/connection:
>>>> parLapply(cl, seq_along(cl), function(i) Sys.info())
>>>>
>>>> As a result, you should get Sys.info() output from each of the cluster
>>>> nodes.
>>>>
>>>>
>>>>>
>>>>> I also tryied to modify the multicore option
>>>>>
>>>>> cl <- makeCluster(detectCores())
>>>>>
>>>>> And my job is killed because it uses more memory ( Max vmem = 17.118G) than
>>>>> allowed (16G)
>>>> With splicedAlignment=TRUE, QuasR will run spliceMap for aligning your
>>>> reads, which may require several GB of memory per node in your cluster
>>>> object. You can avoid the memory overflow by reducing the number of
>>>> nodes in your cluster object, e.g. by:
>>>>
>>>> cl <- makeCluster(4)
>>>>
>>>> which should run through on your machine with 16GB of memory.
>>>>
>>>> Best,
>>>> Michael
>>>>
>>>> _______________________________________________
>>>> Bioconductor mailing list
>>>> Bioconductor at r-project.org
>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>> Search the archives:
>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>



More information about the Bioconductor mailing list