[Rd] Why does my RPy2 program run faster on Windows?

Carlos J. Gil Bellosta cgb at datanalytics.com
Wed May 19 15:19:18 CEST 2010


Dear Abhijit,

If you think that table.CAPM is the culprit, you could run the call to
such function in R on both platforms using Rprof to check which part
of the function is producing the bottleneck.

Best regards,

Carlos J. Gil Bellosta
http://www.datanalytics.com


2010/5/19 Abhijit Bera <abhibera at gmail.com>:
> Update: it appears that the time taken isn't so much on the Data conversion.
> The maximum time taken is in CAPM calculation. :( Anyone know why the CAPM
> calculation would be faster on Windows?
>
> On Wed, May 19, 2010 at 5:51 PM, Abhijit Bera <abhibera at gmail.com> wrote:
>
>> Hi
>>
>> This is my function. It serves an HTML page after the calculations. I'm
>> connecting to a MSSQL DB using pyodbc.
>>
>>     def CAPM(self,client):
>>
>>         r=self.r
>>
>>         cds="1590"
>>         bm="20559"
>>
>>         d1 = []
>>         v1 = []
>>         v2 = []
>>
>>
>>         print"Parsing GET Params"
>>
>>         params=client.g[1].split("&")
>>
>>         for items in params:
>>             item=items.split("=")
>>
>>             if(item[0]=="cds"):
>>                 cds=unquote(item[1])
>>             elif(item[0]=="bm"):
>>                 bm=unquote(item[1])
>>
>>         print "cds: %s bm: %s" % (cds,bm)
>>
>>         print "Fetching data"
>>
>>         t3=datetime.now()
>>
>>         for row in self.cursor.execute("select * from (select * from (
>> select co_code,dlyprice_date,dlyprice_close from feed_dlyprice P where
>> co_code in (%s,%s) ) DataTable PIVOT ( max(dlyprice_close) FOR co_code IN
>> ([%s],[%s])  )PivotTable ) a order by dlyprice_date" %(cds,bm,cds,bm)):
>>             d1.append(str(row[0]))
>>             v1.append(row[1])
>>             v2.append(row[2])
>>
>>         t4=datetime.now()
>>
>>         t1=datetime.now()
>>
>>         print "Calculating"
>>
>>         d1.pop(0)
>>         d1vec = robjects.StrVector(d1)
>>         v1vec = robjects.FloatVector(v1)
>>         v2vec = robjects.FloatVector(v2)
>>
>>         r1 = r('Return.calculate(%s)' %v1vec.r_repr())
>>         r2 = r('Return.calculate(%s)' %v2vec.r_repr())
>>
>>         tl = robjects.rlc.TaggedList([r1,r2],tags=('Geo','Nifty'))
>>         df = robjects.DataFrame(tl)
>>
>>         ts2 = r.timeSeries(df,d1vec)
>>         tsa = r.timeSeries(r1,d1vec)
>>         tsb = r.timeSeries(r2,d1vec)
>>
>>         robjects.globalenv["ta"] = tsa
>>         robjects.globalenv["tb"] = tsb
>>         robjects.globalenv["t2"] = ts2
>>         a = r('table.CAPM(ta,tb)')
>>
>>         t2=datetime.now()
>>
>>
>>         page="<html><title>CAPM</title><body>Result:<br>%s<br>Time taken by
>> DB:%s<br>Time taken by R:%s<br>Total time elapsed:%s<br></body></html>"
>> %(str(a),str(t4-t3),str(t2-t1),str(t2-t3))
>>         print "Serving page:"
>>         #print page
>>
>>         self.serveResource(page,"text",client)
>>
>>
>>
>> On Linux
>> Time taken by DB:0:00:00.024165
>> Time taken by R:0:00:05.572084
>> Total time elapsed:0:00:05.596288
>>
>> On Windows
>> Time taken by DB:0:00:00.112000
>> Time taken by R:0:00:02.355000
>> Total time elapsed:0:00:02.467000
>>
>> Why is there such a huge difference in the time taken by R on the two
>> platforms? Am I doing something wrong? It's my first Rpy2 code so I guess
>> it's badly written.
>>
>> I'm loading the following libraries:
>> 'PerformanceAnalytics','timeSeries','fPortfolio','fPortfolioBacktest'
>>
>> I'm using Rpy2 2.1.0 and R 2.11
>>
>> Regards
>>
>> Abhijit Bera
>>
>>
>>
>>
>>
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>



More information about the R-devel mailing list