[BioC] R: Can an R script be run through a cron job ?

Steffen Moeller steffen_moeller at gmx.de
Fri Nov 20 16:54:59 CET 2009


Hello, going back to the original question I just wanted to indicate
http://dirk.eddelbuettel.com/code/littler.html
which works just fine for me, also with the getops package.

Steffen

Cei Abreu-Goodger wrote:
> may I suggest the following:
> 
> 1) First get all unique Ensembl transcript IDs
> 
> 2) If there are too many, split into groups of ~1-5 thousand (I don't
> know what the optimum would be)
> 
> 3) For each group of ids, use getSequence() to retrieve the 3'UTR.
> 
> 4) rbind the results, check, save
> 
> Cheers,
> 
> Cei
> 
> 
> mauede at alice.it wrote:
>> I reattached my script. I had attached it to an earlier message that
>> maybe was overlooked.
>>
>> As you can see yourself, I scan a big data set, named hsTargets, that
>> contains plenty of target gene transcript IDs with a handle to the
>> relative miRNA.
>> I process such a data base one miRNA at a time. That is, I gather all
>> the transcript IDs for the current miRNA
>> and query biomaRT asking for the 3'utr for all such transcrpts whose
>> ENST are in a vector that I pass as input parameter to the query.
>> Therefore I do use the  vectorized capabilities of R, don't I ?
>>
>> My mistake is to keep the connection to biomaRt opened while
>> processing as many miRNAs as I can.
>> Therefore I acknowledge I have to improve my script and catch the
>> exception so that I have to delete the file currently being written
>> (as in general it will be incomplete) and have the script die gently.
>> Then I have to get my script pause and disconnect from biomaRT
>> regularly to avoid hammering the provided
>>  service. Eventually my process can even end itself instead of
>> sleeping, after saving its current status.  However, I need to set up
>> the task scheduler to restart it some time later ...
>>
>> Regards,
>> Maura
>>
>>
>>
>>
>>
>>
>> -----Messaggio originale-----
>> Da: Kasper Daniel Hansen [mailto:khansen at stat.berkeley.edu]
>> Inviato: ven 20/11/2009 15.12
>> A: mauede at alice.it
>> Cc: Bioconductor  List
>> Oggetto: Re: [BioC] Can an R script be run through a cron job ?
>>  
>> Maura
>>
>> Unfortunately you never showed us your code, despite repeated
>> requests  to do so.  That makes it hard to help (and frankly, ignoring
>> requests  for information from people trying to help you is extremely 
>> counterproductive).
>>
>> Your comments in your last email in the last thread indicates that
>> you  have code that essentially do this
>>
>> for(i in 1:100)
>>    getBM(...)
>>
>> If this is true (which we would know if we can see the code), this is 
>> why your script fail.  There are two problems with this (1) you are 
>> not using the vectorized capabilities of R, but more important is (2) 
>> you are sending many requests to Biomart and typically such behaviour 
>> might mean your IP address will be banned temporarily.  They don't 
>> like people hammering their services with repeated requests.
>>
>> Instead you should create a query that essentially asks for all your 
>> return objects in one request.  That should be easy to write, and
>> will  be much faster.  You might think that processing the output is 
>> slightly harder, but that is the thing to do (and with more R 
>> experience, processing a big output is actually easier).
>>
>> Regarding your actual question in this email, you seem to be very 
>> confused regarding the meaning of a batch job.  This word has many 
>> different interpretations (not related to R), so it is hard to google 
>> for.  What you are specifically asking for has everything to do with 
>> what operating system you are using (Windows, Linux, OS X) and
>> nothing  to do with R.
>>
>> Kasper
>>
>>
>> On Nov 19, 2009, at 18:24 , <mauede at alice.it> <mauede at alice.it> wrote:
>>
>>> I am running a script that extracts many long strings from remote 
>>> data bases.
>>> Every now and then the remote data base gets out of sync and closes 
>>> the connection.
>>> I have been adviced to implement an R script that queries the data 
>>> base in batch modality.
>>> I never ran an R script in batch modality. I think I have to use R 
>>> CMD BATCH or something similar
>>> Given the amount of data I am extracting, I am concerned about 
>>> having to parse a huge data file looking for the
>>> informattion I need.
>>> The less painful modification would consist in running the R script 
>>> as is but through a cron job. So that the script
>>> should be set to sleep  on an established frequency and when 
>>> awakened it should resume from where it was interrupted.
>>> Is such a scheme doable in R ? If it is then what are the most 
>>> important commands to make a script sleep and wake up
>>> on a regular basis ?
>>>
>>> Thank you in advance,
>>> Maura
>>>
>>>
>>>
>>>
>>> tutti i telefonini TIM!
>>>
>>>
>>>     [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at stat.math.ethz.ch
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>>
>>
>>
>>
>>
>> e tutti i telefonini TIM!
>> Vai su
>>
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list