[R] runtime on ising model

Thu Oct 28 19:31:42 CEST 2010

On Oct 28, 2010, at 12:20 PM, David Winsemius wrote:

>
> On Oct 28, 2010, at 11:52 AM, Michael D wrote:
>
>> Mike, I'm not sure what you mean about removing foo but I think the  
>> method
>> is sound in diagnosing a program issue and the results speak for  
>> themselves.
>>
>> I did invert my if statement at the suggestion of a CS professor  
>> (who also
>> suggested recoding in C, but I'm in an applied math program and  
>> haven't had
>> the time to take programming courses, which i know would be helpful)
>>
>> Anyway, with the statement as:
>>
>> if( !(k %in% c(10^4,10^5,10^6,10^7)) ){
>> #do nothing
>> } else {
>> q <- q+1
>> Out[[q]] <- M
>> }
>>
>> run times were back to around 20 minutes.
>
> Have you tried replacing all of those 10^x operations with their  
> integer equivalents, c(10000L, 100000L, 1000000L)? Each time through  
> the loop you are unnecessarily calling the "^" function 4 times. You  
> could also omit the last one. 10^7,  during testing since M at the  
> last iteration (k=10^7) would be the final value and you could just  
> assign the state of M at the end. So we have eliminated 4*10^7  
> unnecessary "^" calls and 10^7 unnecessary comparisons. (The CS  
> professor is perhaps used to having the C compiler do all thinking  
> of this sort for him.)

Bill Dunlap's suggestion to use "==" instead of %in% cut the time to  
1/3 of what it had been even after the pre-calculation of the integer  
values( which only improved the looping times by 30%). The combination  
of the two with:
  if (k ==10000L|k==100000L|k==1000000L ) { ... }

... resulted in an improvement by a factor or 12.006/2.523 or 475% for  
the interim checking and printing operation using Bill's test suite.

>
> -- 
> David
>
>> So as best I can tell something
>> happens in the if statement causing the computer to work ahead, as  
>> the
>> professor suggests. I'm no expert on R (and have no desire to try  
>> looking at
>> the R source code (it would only confuse me)) but if anyone can offer
>> guidance on how the if statement works (Does R try to work ahead?  
>> Under what
>> conditions does it try to "work ahead" so I can try to exploit this
>> behavior) I would greatly appreciate it.
>> If it would require too much knowledge of the computer system to  
>> understand
>> I doubt I would be able to make use of it, but maybe someone else  
>> could
>> benefit.
>>
>> On Tue, Oct 26, 2010 at 3:24 PM, Mike Marchywka <marchywka at hotmail.com 
>> >wrote:
>>
>>> ----------------------------------------
>>>> Date: Tue, 26 Oct 2010 12:53:14 -0400
>>>> From: mike409 at gmail.com
>>>> To: jim at bitwrit.com.au
>>>> CC: r-help at r-project.org
>>>> Subject: Re: [R] runtime on ising model
>>>>
>>>> I have an update on where the issue is coming from.
>>>>
>>>> I commented out the code for "pos[k+1] <- M[i,j]" and the if  
>>>> statement
>>> for
>>>> time = 10^4, 10^5, 10^6, 10^7 and the storage and everything ran
>>> fast(er).
>>>> Next I added back in the "pos" statements and still runtimes were  
>>>> good
>>>> (around 20 minutes).
>>>>
>>>> So I'm left with something is causing problems in:
>>>
>>> I haven't looked at this since some passing interest in magnetics
>>> decades ago, something about 8-tracks and cassettes, but you have
>>> to be careful with conclusions like " I removed foo and problem
>>> went away therefore problem was foo." Performance issues are often
>>> caused by memory, not CPU limitations. Removing anything with a big
>>> memory footprint could speed things up. IO can be a real bottleneck.
>>> If you are talking about things on minute timescales, look at task
>>> manager and see if you are even CPU limited. Look for page faults
>>> or IO etc. If you really need performance and have a task which
>>> is relatively simple, don't ignore c++ as a way to generate data
>>> points and then import these into R for analysis.
>>>
>>> In short, just because you are focusing on math it doesn't mean
>>> the computer is limited by that.
>>>
>>>
>>>>
>>>> ## Store state at time 10^4, 10^5, 10^6, 10^7
>>>> if( k %in% c(10^4,10^5,10^6,10^7) ){
>>>> q <- q+1
>>>> Out[[q]] <- M
>>>> }
>>>>
>>>> Would there be any reason R is executing the statements inside  
>>>> the "if"
>>>> before getting to the logical check?
>>>> Maybe R is written to hope for the best outcome (TRUE) and will  
>>>> just
>>> throw
>>>> out its work if the logic comes up FALSE?
>>>> I guess I can always break the for loop up into four parts and  
>>>> store the
>>>> state at the end of each, but thats an unsatisfying solution to me.
>>>>
>>>>
>>>> Jim, I like the suggestion of just pulling one big sample, but  
>>>> since I
>>> can
>>>> get the runtimes under 30 minutes just by removing the storage  
>>>> piece I
>>> doubt
>>>> I would see any noticeable changes by pulling large sample vectors.
>>>>
>>>> Thanks,
>>>> Michael
>>>>
>>>> On Tue, Oct 26, 2010 at 6:22 AM, Jim Lemon  wrote:
>>>>
>>>>> On 10/26/2010 04:50 PM, Michael D wrote:
>>>>>
>>>>>> So I'm in a stochastic simulations class and I having issues  
>>>>>> with the
>>>>>> amount
>>>>>> of time it takes to run the Ising model.
>>>>>>
>>>>>> I usually don't like to attach the code I'm running, since it  
>>>>>> will
>>>>>> probably
>>>>>> make me look like a fool, but I figure its the best way I can  
>>>>>> find any
>>>>>> bits
>>>>>> I can speed up run time.
>>>>>>
>>>>>> As for the goals of the exercise:
>>>>>> I need the state of the system at time=1, 10k, 100k, 1mill, and  
>>>>>> 10mill
>>>>>> and the percentage of vertices with positive spin at all t
>>>>>>
>>>>>> Just to be clear, i'm not expecting anyone to tell me how to  
>>>>>> program
>>> this
>>>>>> model, cause I know what I have works for this exercise, but it  
>>>>>> takes
>>> far
>>>>>> too long to run and I'd like to speed it up by replacing slow
>>> operations
>>>>>> wherever possible.
>>>>>>
>>>>>> Hi Michael,
>>>>> One bottleneck is probably the sampling. If it doesn't grab too  
>>>>> much
>>>>> memory, setting up a vector of the samples (maybe a million at a  
>>>>> time
>>> if 10
>>>>> million is too big - might be able to rewrite your sample vector  
>>>>> when
>>> you
>>>>> store the state) and using k (and an offset if you don't have  
>>>>> one big
>>>>> vector) to index it will give you some speed.
>>>>>
>>>>> Jim
>>>>>
>>>>>
>>>>
>>>> [[alternative HTML version deleted]]
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>> 	[[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius, MD
> West Hartford, CT
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT