[BioC] QuasR on Linux Cluster

Ugo Borello ugo.borello at inserm.fr
Tue Oct 22 15:08:12 CEST 2013


I will run more tests to understand where is the problem.

But I don't know if, in the meantime, this could help:
when I run qAlign from the R console on the server I get:

> proj <- qAlign(sampleFile, genome= genomeName, splicedAlignment=TRUE,
clObj=cl)
alignment files missing - need to:
    create 1 genomic alignment(s)
will start in ..9s..8s..7s..6s..5s..4s..3s..2s..1s
Testing the compute nodes...OK
Loading QuasR on the compute nodes...OK
Available cores:
nodeNames
ccage014 
       4 
Performing genomic alignments for 1 samples. See progress in the log file:
/sps/inter/isc/uborello/input/QuasR_log_14c95dfa8d40.txt
Error in checkForRemoteErrors(val) :
  one node produced an error: Error on ccage014 processing sample
/sps/inter/isc/uborello/input/133het.fastq : error in evaluating the
argument 'file' in selecting a method for function 'scanFaIndex': Error in
value[[3L]](cond) : 'open' index failed
  file: /tmp/RtmpnzdpVP/file722251e6f03c.fa
Calls: open ... tryCatch -> tryCatchList -> tryCatchOne -> <Anonymous>


And when I drop the argument ' splicedAlignment=TRUE' I get:

Performing genomic alignments for 1 samples. See progress in the log file:
/sps/inter/isc/uborello/input/QuasR_log_14c9af4e6be.txt
sh: line 1:  7369 Aborted                 (core dumped)
'/sps/inter/isc/uborello/software/lib64/R/library/Rbowtie/bowtie'
'/sps/inter/isc/uborello/software/lib64/R/library/BSgenome.Mmusculus.UCSC.mm
10.Rbowtie/alignmentIndex/bowtieIndex'
'/sps/inter/isc/uborello/input/133het.fastq' -m 1 --best --strata
--phred33-quals -S -p 4 '/tmp/RtmpnzdpVP/133het.fastq7222669b51e6.sam' 2>&1
Error in checkForRemoteErrors(val) :
  one node produced an error: Error on ccage014 processing sample
/sps/inter/isc/uborello/input/133het.fastq : bowtie failed to perform the
alignments





Thank you

Ugo



> From: Michael Stadler <michael.stadler at fmi.ch>
> Date: Tue, 22 Oct 2013 11:50:05 +0200
> To: Ugo Borello <ugo.borello at inserm.fr>, <bioconductor at r-project.org>
> Subject: Re: [BioC] QuasR on Linux Cluster
> 
> I can see from the intermediate files that SpliceMap was stopped halfway
> through, before it could create the single sam file with spliced alignments.
> 
> QuasR tries to detect such cases in the child R process (one of the R
> processes spawned in your cluster object) and throws an error with a
> descriptive message. However, you do not get this error message. Rather,
> you get an error indicating that the parent R process lost it's
> connection to the child R process.
> 
> It's hard to get at this from far, so I'll have to wildly guess. Could
> it be that the child R process is terminated and therefore neither able
> to signal failure, nor to communicate with the parent R process? Can you
> give more details about your setup, e.g. if you are running some batch
> or queueing system that controls job execution?
> 
> Other things that may help to narrow down the problem is to rerun
> qAlign() on a subset of the dataset, or without a cluster object. It may
> also help to know a bit more about the sample you try to analyse (read
> length, read number, sequence file format).
> 
> Michael
> 
> 
> 
> 
> On 22.10.2013 10:43, Ugo Borello wrote:
>> Dear Michael,
>> I think that the disk space is not an issue; anyway, I will double check
>> with the administrator.
>> 
>> I used 4 nodes and QuasR stopped at the .sam file. See the output files in
>> attachment.
>> 
>> When I use less than 4 nodes, it stops at the beginning of the process:
>> 
>> [1] "Writing BSgenome to disk on ccwsge0144 :
>> /scratch/4847271.1.huge/Rtmp7nHkpp/file5971727e49b5.fa"
>> 
>> 
>> 
>> What am I missing?
>> 
>> Thank you
>> 
>> Ugo
>> 
>> 
>> 
>>> From: Michael Stadler <michael.stadler at fmi.ch>
>>> Date: Mon, 21 Oct 2013 17:48:53 +0200
>>> To: Ugo Borello <ugo.borello at inserm.fr>, <bioconductor at r-project.org>
>>> Subject: Re: [BioC] QuasR on Linux Cluster
>>> 
>>> Your cluster object seems functional now.
>>> 
>>> Another possible problem could be available diskspace in R's tempdir().
>>> It is used by qAlign to temporarily store the uncompressed fastq files,
>>> the sam files and the bam files (and thus needs several-fold more free
>>> capacity than the size of your fastq.gz files). For more information,
>>> see vignette section 4.1 "File storage locations".
>>> 
>>> If tempdir() is too small, you can use redirect R's tempdir() by setting
>>> the TMPDIR environment variable, or just for one qAlign call by using
>>> the "cacheDir" parameter of qAlign.
>>> 
>>> If you are sure that diskspace is not the issue, could you give qAlign()
>>> another try, using a cluster object with only 4 nodes to avoid any
>>> memory issues?
>>> 
>>> Michael
>>> 
>>> 
>>> On 21.10.2013 15:09, Ugo Borello wrote:
>>>> Thank you Michael,
>>>> My bad, I am not able to find the QuasR_log at the moment. Anyway the last
>>>> step was the .sam file. QuasR was not proceeding in converting the .sam
>>>> file
>>>> to a .bam file.
>>>> In attachment some other info on the running job before death.
>>>> Those refer to a case where cl<- makeCluster(1).
>>>> 
>>>> 
>>>> I run your test and I got:
>>>>> library(parallel)
>>>>> cl<- makeCluster(detectCores())
>>>>> info<- parLapply(cl, seq_along(cl), function(i) Sys.info())
>>>>> info
>>>> [[1]]
>>>>                              sysname                              release
>>>>                              "Linux"                 "2.6.18-348.3.1.el5"
>>>>                              version                             nodename
>>>> "#1 SMP Tue Mar 5 13:19:32 EST 2013"                         "ccwsge0053"
>>>>                              machine                                login
>>>>                             "x86_64"                            "unknown"
>>>>                                 user                       effective_user
>>>>                           "uborello"                           "uborello"
>>>> 
>>>> The same for the 32 nodes.
>>>> 
>>>> Then I run:
>>>>> library(parallel)
>>>>> type <- if (exists("mcfork", mode="function")) "FORK" else "PSOCK"
>>>>> type
>>>> [1] "PSOCK"
>>>>> cores <- getOption("mc.cores", detectCores())
>>>>> cl <- makeCluster(cores, type=type)
>>>>> cl
>>>> socket cluster with 32 nodes on host 'localhost'
>>>>> results <- parLapply(cl, 1:100, sqrt)
>>>>> sum(unlist(results))
>>>> [1] 671.4629
>>>>> stopCluster(cl)
>>>> 
>>>> I don't know if this could help.
>>>> 
>>>> Any suggestions?
>>>> 
>>>> Ugo
>>>> 
>>>> 
>>>> 
>>>>> From: Michael Stadler <michael.stadler at fmi.ch>
>>>>> Date: Mon, 21 Oct 2013 11:30:27 +0200
>>>>> To: <bioconductor at r-project.org>
>>>>> Subject: Re: [BioC] QuasR on Linux Cluster
>>>>> 
>>>>> Hi Ugo,
>>>>> 
>>>>> On 18.10.2013 13:56, Ugo Borello wrote:> Hi all,
>>>>>> I am trying to use QuasR on a Linux Cluster:1 machine/multiple cores.
>>>>>> 
>>>>>> I run:
>>>>>> library(QuasR)
>>>>>> library(BSgenome.Mmusculus.UCSC.mm10)
>>>>>> 
>>>>>> cl <- makeCluster(1)
>>>>>> 
>>>>>> sampleFile <- "sampleFile.txt"
>>>>>> 
>>>>>> genomeName <- "BSgenome.Mmusculus.UCSC.mm10"
>>>>>> 
>>>>>> proj <- qAlign(sampleFile, genome= genomeName, splicedAlignment=TRUE,
>>>>>> clObj=cl)
>>>>>> 
>>>>>> And I get
>>>>>>> proj <- qAlign(sampleFile, genome= genomeName, splicedAlignment=TRUE,
>>>>>> clObj=cl)
>>>>>> alignment files missing - need to:
>>>>>>     create 1 genomic alignment(s)
>>>>>> Testing the compute nodes...OK
>>>>>> Loading QuasR on the compute nodes...OK
>>>>>> Available cores:
>>>>>> nodeNames
>>>>>> ccwsge0155
>>>>>>          1
>>>>>> Performing genomic alignments for 1 samples. See progress in the log
>>>>>> file:
>>>>>> /scratch/4401022.1.huge/QuasR_log_41394115a102.txt
>>>>>> Error in unserialize(node$con) : error reading from connection
>>>>>> Calls: qAlign ... FUN -> recvData -> recvData.SOCKnode -> unserialize
>>>>>> Execution halted
>>>>> 
>>>>> The error that you get is not created within QuasR; my guess is that it
>>>>> comes from the "parallel" package, indicating that something goes wrong
>>>>> when using your cluster object "cl".
>>>>> 
>>>>> I would suggest testing whether your cluster object works fine. It would
>>>>> help to know if the error message appears immediately after you call
>>>>> qAlign(), or if it takes some time to process. Also, it would be great
>>>>> to see the content of the QuasR log file.
>>>>> 
>>>>> Here is a simple test you could try to check your cluster
>>>>> object/connection:
>>>>> parLapply(cl, seq_along(cl), function(i) Sys.info())
>>>>> 
>>>>> As a result, you should get Sys.info() output from each of the cluster
>>>>> nodes.
>>>>> 
>>>>> 
>>>>>> 
>>>>>> I also tryied to modify the multicore option
>>>>>> 
>>>>>> cl <- makeCluster(detectCores())
>>>>>> 
>>>>>> And my job is killed because it uses more memory ( Max vmem = 17.118G)
>>>>>> than
>>>>>> allowed (16G)
>>>>> With splicedAlignment=TRUE, QuasR will run spliceMap for aligning your
>>>>> reads, which may require several GB of memory per node in your cluster
>>>>> object. You can avoid the memory overflow by reducing the number of
>>>>> nodes in your cluster object, e.g. by:
>>>>> 
>>>>> cl <- makeCluster(4)
>>>>> 
>>>>> which should run through on your machine with 16GB of memory.
>>>>> 
>>>>> Best,
>>>>> Michael
>>>>> 
>>>>> _______________________________________________
>>>>> Bioconductor mailing list
>>>>> Bioconductor at r-project.org
>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>> Search the archives:
>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>> 
>>



More information about the Bioconductor mailing list