[BioC] R: Can an R script be run through a cron job ?

Cei Abreu-Goodger cei at ebi.ac.uk
Fri Nov 20 16:10:02 CET 2009


may I suggest the following:

1) First get all unique Ensembl transcript IDs

2) If there are too many, split into groups of ~1-5 thousand (I don't 
know what the optimum would be)

3) For each group of ids, use getSequence() to retrieve the 3'UTR.

4) rbind the results, check, save

Cheers,

Cei


mauede at alice.it wrote:
> I reattached my script. I had attached it to an earlier message that maybe was overlooked.
> 
> As you can see yourself, I scan a big data set, named hsTargets, that contains plenty of target gene 
> transcript IDs with a handle to the relative miRNA.
> I process such a data base one miRNA at a time. That is, I gather all the transcript IDs for the current miRNA
> and query biomaRT asking for the 3'utr for all such transcrpts whose ENST are in a vector that I pass as input parameter to the query. Therefore I do use the  vectorized capabilities of R, don't I ?
> 
> My mistake is to keep the connection to biomaRt opened while processing as many miRNAs as I can.
> Therefore I acknowledge I have to improve my script and catch the exception so that I have to delete the file currently being written (as in general it will be incomplete) and have the script die gently.
> Then I have to get my script pause and disconnect from biomaRT regularly to avoid hammering the provided
>  service. 
> Eventually my process can even end itself instead of sleeping, after saving its current status.  
> However, I need to set up the task scheduler to restart it some time later ...
> 
> Regards,
> Maura
> 
> 
> 
> 
> 
> 
> -----Messaggio originale-----
> Da: Kasper Daniel Hansen [mailto:khansen at stat.berkeley.edu]
> Inviato: ven 20/11/2009 15.12
> A: mauede at alice.it
> Cc: Bioconductor  List
> Oggetto: Re: [BioC] Can an R script be run through a cron job ?
>  
> Maura
> 
> Unfortunately you never showed us your code, despite repeated requests  
> to do so.  That makes it hard to help (and frankly, ignoring requests  
> for information from people trying to help you is extremely  
> counterproductive).
> 
> Your comments in your last email in the last thread indicates that you  
> have code that essentially do this
> 
> for(i in 1:100)
>    getBM(...)
> 
> If this is true (which we would know if we can see the code), this is  
> why your script fail.  There are two problems with this (1) you are  
> not using the vectorized capabilities of R, but more important is (2)  
> you are sending many requests to Biomart and typically such behaviour  
> might mean your IP address will be banned temporarily.  They don't  
> like people hammering their services with repeated requests.
> 
> Instead you should create a query that essentially asks for all your  
> return objects in one request.  That should be easy to write, and will  
> be much faster.  You might think that processing the output is  
> slightly harder, but that is the thing to do (and with more R  
> experience, processing a big output is actually easier).
> 
> Regarding your actual question in this email, you seem to be very  
> confused regarding the meaning of a batch job.  This word has many  
> different interpretations (not related to R), so it is hard to google  
> for.  What you are specifically asking for has everything to do with  
> what operating system you are using (Windows, Linux, OS X) and nothing  
> to do with R.
> 
> Kasper
> 
> 
> On Nov 19, 2009, at 18:24 , <mauede at alice.it> <mauede at alice.it> wrote:
> 
>> I am running a script that extracts many long strings from remote  
>> data bases.
>> Every now and then the remote data base gets out of sync and closes  
>> the connection.
>> I have been adviced to implement an R script that queries the data  
>> base in batch modality.
>> I never ran an R script in batch modality. I think I have to use R  
>> CMD BATCH or something similar
>> Given the amount of data I am extracting, I am concerned about  
>> having to parse a huge data file looking for the
>> informattion I need.
>> The less painful modification would consist in running the R script  
>> as is but through a cron job. So that the script
>> should be set to sleep  on an established frequency and when  
>> awakened it should resume from where it was interrupted.
>> Is such a scheme doable in R ? If it is then what are the most  
>> important commands to make a script sleep and wake up
>> on a regular basis ?
>>
>> Thank you in advance,
>> Maura
>>
>>
>>
>>
>> tutti i telefonini TIM!
>>
>>
>> 	[[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
> 
> 
> 
> 
> 
> 
> e tutti i telefonini TIM!
> Vai su 
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list