[R] Conditional Weighted Average (ddply or any other function)

Fri Mar 1 20:39:41 CET 2013

See my last post which crossed yours.  

John Kane
Kingston ON Canada

> -----Original Message-----
> From: anandpunit at gmail.com
> Sent: Fri, 1 Mar 2013 14:33:17 -0500
> To: jrkrideau at inbox.com
> Subject: Re: [R] Conditional Weighted Average (ddply or any other
> function)
> 
> oops - it should be MKT, I have been playing with a number of data
> sets simultaneously.
> 
> ddply (dataread , .(Sector, FISCALYEAR), summarise,WROE=wavg(ROE,  MKT)))
> 
> On Fri, Mar 1, 2013 at 2:23 PM, John Kane <jrkrideau at inbox.com> wrote:
>> Okay I got the data but you seem to have an undefined variable in wavg.
>> You write :
>> ddply (dataread , .(Sector, FISCALYEAR), summarise,WROE=wavg(ROE,
>> MKTCAP)))
>> 
>> There is no MKTCAP in the data.frame.  Also there is one too many ) in
>> the equation: I think you mean :
>> ddply (dataread , .(Sector, FISCALYEAR), summarise,WROE=wavg(ROE,
>> MKTCAP))
>> 
>> Have you left out a equation that calculates  MKTCAP?
>> 
>> John Kane
>> Kingston ON Canada
>> 
>> 
>>> -----Original Message-----
>>> From: anandpunit at gmail.com
>>> Sent: Fri, 1 Mar 2013 13:53:44 -0500
>>> To: jrkrideau at inbox.com
>>> Subject: Re: [R] Conditional Weighted Average (ddply or any other
>>> function)
>>> 
>>> Hi John,
>>> 
>>> The sample size is huge involving 10,000 + firms. I have put a
>>> representative sample using dput ( Name, ticker and country have been
>>> changed so that firms cannot be identified due to proprietary data
>>> set, also EPS is not required and removed from the dataset)
>>> 
>>> structure(list(NAME = structure(c(8L, 8L, 8L, 8L, 8L, 8L, 7L,
>>> 7L, 7L, 7L, 7L, 7L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L,
>>> 6L, 6L, 4L, 4L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 1L,
>>> 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("CCC", "CTAX",
>>> "INN", "NOB", "SH", "SZ", "WASH", "WILLSON"), class = "factor"),
>>>     Ticker = structure(c(7L, 7L, 7L, 7L, 7L, 7L, 8L, 8L, 8L,
>>>     8L, 8L, 8L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 6L,
>>>     6L, 4L, 4L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 1L,
>>>     1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("CC13",
>>>     "CT56", "INN12", "NB12", "SH12", "SZ12", "W12", "W15"), class =
>>> "factor"),
>>>     Industry = structure(c(3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L,
>>>     4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
>>>     4L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 5L, 5L,
>>>     5L, 5L, 5L, 5L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("Commercial &
>>> Professional Serv",
>>>     "Energy", "Media", "Retail", "Transportation"), class = "factor"),
>>>     Sector = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
>>>     1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
>>>     1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L,
>>>     3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("Consumer
>>> Discretionary",
>>>     "Energy", "Industrials"), class = "factor"), Country =
>>> structure(c(4L,
>>>     4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 3L,
>>>     3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 1L,
>>>     1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
>>>     2L, 2L, 2L), .Label = c("Brazil", "China", "India", "UK"), class =
>>> "factor"),
>>>     FISCALYEAR = structure(c(3L, 2L, 1L, 4L, 5L, 6L, 3L, 2L,
>>>     1L, 4L, 5L, 6L, 3L, 2L, 1L, 4L, 5L, 6L, 3L, 3L, 2L, 1L, 4L,
>>>     5L, 6L, 3L, 2L, 1L, 4L, 5L, 6L, 3L, 2L, 1L, 4L, 5L, 6L, 3L,
>>>     2L, 1L, 4L, 5L, 6L, 3L, 2L, 1L, 4L, 5L, 6L), .Label = c("FY-1",
>>>     "FY-2", "FY-3", "FY0", "FY1", "FY2"), class = "factor"),
>>>     ROE = c(0.026, 0.0656, 0.1621, 0.1885, 0.1968, 0.2126, 0.0207,
>>>     0.0319, 0.0963, 0.0431, 0.066, 0.066, 0.0707, 0.0797, 0.0781,
>>>     0.078, 0.098, 0.126, 0.0352, 0.2625, 0.3714, 0.2929, 0.3133,
>>>     0.2509, 0.2398, 0.2779, 0.1109, 0.0509, 0.069, 0.1017, 0.1298,
>>>     0.5842, 0.3953, 0.4429, 0.3616, 0.26, 0.2, 0.4472, 0.2912,
>>>     0.21, 0.2849, 0.3553, 0.4347, 0.3289, 0.3846, 0.2643, 0.0458,
>>>     0.1265, 0.28), MKT = c(2919236084, 836858582, 2015182617,
>>>     3399344971, 4324821777, 4324821777, 7619453125, 3579844727,
>>>     4132238281, 3712239990, 2879757813, 2879757813, 1525237793,
>>>     700357605, 1814942993, 1858225342, 1242890503, 1242890503,
>>>     1879700000, 557093400, 224900300, 1634700000, 1443200000,
>>>     3582664735, 3582664735, 5830366211, 10660833984, 9024061523,
>>>     7628660645, 9154108398, 9154108398, 7064532227, 1804380005,
>>>     6331067871, 10445639648, 9153587891, 9153587891, 6231200000,
>>>     4.078e+09, 10107500000, 12460300000, 17800051556, 17800051556,
>>>     513478700, 260993500, 882575400, 1.151e+09, 855938413, 855938413
>>>     )), .Names = c("NAME", "Ticker", "Industry", "Sector", "Country",
>>> "FISCALYEAR", "ROE", "MKT"), class = "data.frame", row.names = c(NA,
>>> -49L))
>>> 
>>> Thanks,
>>> Punit
>>> 
>>> 
>>> 
>>> On Fri, Mar 1, 2013 at 12:51 PM, John Kane <jrkrideau at inbox.com> wrote:
>>>> See below
>>>> 
>>>> 
>>>>> -----Original Message-----
>>>>> From: anandpunit at gmail.com
>>>>> Sent: Fri, 1 Mar 2013 12:36:53 -0500
>>>>> To: jrkrideau at inbox.com
>>>>> Subject: Re: [R] Conditional Weighted Average (ddply or any other
>>>>> function)
>>>>> 
>>>>> Hi John,
>>>>> 
>>>>> I was using symbols, Column ROE, EPS, MKTCAP are numeric, Name,
>>>>> Ticker, Sector, Country, FISCALYEAR or Year are character strings.
>>>>> 
>>>>> and column "Year" is referring to "FISCALYEAR"
>>>>> 
>>>>  Definitely a no-no in R-help.  :)  We really need  some
>>>> representative
>>>> sample data to play with.  See
>>>> https://github.com/hadley/devtools/wiki/Reproducibility for some
>>>> general
>>>> pointers on how to compose a good question.  The fact that you
>>>> included
>>>> the code you are using was excellent but without some data it is
>>>> rather
>>>> useless.
>>>> 
>>>>  The easiest way to supply data  is to use the dput() function.
>>>> Example
>>>> with your file named "testfile":
>>>> dput(testfile)
>>>> Then copy the output and paste into your email.  This is what I did
>>>> with
>>>> your data that I pasted into my email .  I added the dat1  <-  to it.
>>>> 
>>>> For large data sets, you can just supply a representative sample.
>>>> Usually,  dput(head(testfile, 100)) will be sufficient.
>>>> 
>>>> I hope this is of some help.
>>>> 
>>>> 
>>>>> 
>>>>> On Fri, Mar 1, 2013 at 12:31 PM, John Kane <jrkrideau at inbox.com>
>>>>> wrote:
>>>>>> It is not at all clear what you are doing.  You state that the data
>>>>>> set
>>>>>> you are using is what I have called dat1 : see dput form below.
>>>>>> 
>>>>>> As far as I can see there is no numerical value in there.
>>>>>> 
>>>>>> ##===========data set in dput form================#
>>>>>> dat1  <-  structure(list(Name = c("N1", "N1", "N1", "N1", "N1",
>>>>>> "N1",
>>>>>> "N1",
>>>>>>          "N2", "N2", "N2", "N2", "N2", "N2", "N2"), Ticker = c("T1",
>>>>>> "T1",
>>>>>>          "T1", "T1", "T1", "T1", "T1", "T2", "T2", "T2", "T2", "T2",
>>>>>> "T2",
>>>>>>          "T2"), Sector = c("S1", "S1", "S1", "S1", "S1", "S1", "S1",
>>>>>> "S2",
>>>>>>           "S2", "S2", "S2", "S2", "S2", "S2"), Industry = c("I1",
>>>>>> "I1",
>>>>>> "I1", "I1", "I1", "I1", "I1", "I2", "I2", "I2", "I2", "I2", "I2",
>>>>>>           "I2"), Country = c("C1", "C1", "C1", "C1", "C1", "C1",
>>>>>> "C1",
>>>>>>            "C2", "C2", "C2", "C2", "C2", "C2", "C2"), Year =
>>>>>> c("FY-4",
>>>>>> "FY-3",
>>>>>>           "FY-2", "FY-1", "FY0", "FY1", "FY2", "FY-4", "FY-3",
>>>>>> "FY-2",
>>>>>>           "FY-2", "FY0", "FY2", "FY2"), ROE = c("ROE11", "ROE12",
>>>>>> "ROE13",
>>>>>>           "ROE14", "ROE15", "ROE16", "ROE17", "ROE21", "ROE22",
>>>>>> "ROE23",
>>>>>>           "ROE24", "ROE25", "ROE26", "ROE27"), EPS = c("EPS11",
>>>>>> "EPS12",
>>>>>>           "EPS13", "EPS14", "EPS15", "EPS16", "EPS17", "EPS21",
>>>>>> "EPS22",
>>>>>>           "EPS23", "EPS24", "EPS25", "EPS26", "EPS27"), MKTCAP =
>>>>>> c("MKT11",
>>>>>>           "MKT12", "MKT13", "MKT14", "MKT15", "MKT16", "MKT17",
>>>>>> "MKT21",
>>>>>>           "MKT22", "MKT23", "MKT24", "MKT25", "MKT26", "MKT27")),
>>>>>> .Names
>>>>>> = c("Name",
>>>>>>          "Ticker", "Sector", "Industry", "Country", "Year", "ROE",
>>>>>> "EPS",
>>>>>>          "MKTCAP"), class = "data.frame", row.names = c(NA, -14L))
>>>>>> ## =================end of dataset==================#
>>>>>> 
>>>>>> There is no FISCALYEAR variable that you specifed below
>>>>>> 
>>>>>>> ddply (dataread , .(Sector, FISCALYEAR), summarise, >
>>>>>>> WROE=wavg(ROE,
>>>>>>> MKTCAP)))
>>>>>> 
>>>>>> I think we need a bit more information.
>>>>>> 
>>>>>> John Kane
>>>>>> Kingston ON Canada
>>>>>> 
>>>>>> 
>>>>>>> -----Original Message-----
>>>>>>> From: anandpunit at gmail.com
>>>>>>> Sent: Fri, 1 Mar 2013 11:01:42 -0500
>>>>>>> To: r-help at r-project.org
>>>>>>> Subject: [R] Conditional Weighted Average (ddply or any other
>>>>>>> function)
>>>>>>> 
>>>>>>> Hello R community,
>>>>>>> 
>>>>>>> I am computing weighted average statistic by using ddply function:
>>>>>>> 
>>>>>>> My data set is:
>>>>>>> N1  T1  S1  I1  C1 FY-4  ROE11  EPS11 MKT11
>>>>>>> N1  T1  S1  I1  C1 FY-3  ROE12  EPS12 MKT12
>>>>>>> N1  T1  S1  I1  C1 FY-2  ROE13  EPS13 MKT13
>>>>>>> N1  T1  S1  I1  C1 FY-1  ROE14  EPS14 MKT14
>>>>>>> N1  T1  S1  I1  C1 FY0   ROE15  EPS15 MKT15
>>>>>>> N1  T1  S1  I1  C1 FY1   ROE16  EPS16 MKT16
>>>>>>> N1  T1  S1  I1  C1 FY2   ROE17  EPS17 MKT17
>>>>>>> N2  T2  S2  I2  C2 FY-4  ROE21  EPS21 MKT21
>>>>>>> N2  T2  S2  I2  C2 FY-3  ROE22  EPS22 MKT22
>>>>>>> N2  T2  S2  I2  C2 FY-2  ROE23  EPS23 MKT23
>>>>>>> N2  T2  S2  I2  C2 FY-2  ROE24  EPS24 MKT24
>>>>>>> N2  T2  S2  I2  C2 FY0   ROE25  EPS25 MKT25
>>>>>>> N2  T2  S2  I2  C2 FY2   ROE26  EPS26 MKT26
>>>>>>> N2  T2  S2  I2  C2 FY2   ROE27  EPS27 MKT27
>>>>>>> 
>>>>>>> with colnames:
>>>>>>> (Name,Ticker,Sector,Industry,Country,Year,ROE,EPS,MKTCAP)
>>>>>>> 
>>>>>>> I want to compute
>>>>>>> 1) Weighted ROE based on Sector and Fiscal Year.
>>>>>>> For firm N1 of Sector S1 and Fiscalyear FY-3 weight is
>>>>>>> MKT1 / SUM(MKT, where Sector = S1, Fiscalyear FY-3)
>>>>>>> 
>>>>>>> 2) Weighted ROE based on Country and Fiscal Year.
>>>>>>> For firm N1 of Country C1 and Fiscalyear FY-3 weight is
>>>>>>> MKT1 / SUM(MKT, where Country = C1, Fiscalyear FY-3)
>>>>>>> 
>>>>>>> 3) Weighted ROE based on Country, Sector and  Fiscal Year.
>>>>>>> For firm N1 of Country C1, Sector S1 and Fiscalyear FY-3
>>>>>>> weight is MKT1 / SUM(MKT, where Country = C1, Sector = S1,
>>>>>>> Fiscalyear
>>>>>>> FY-3)
>>>>>>> 
>>>>>>> 4) Weighted ROE based on Country, Industry and  Fiscal Year.
>>>>>>> For firm N1 of Country C1, Industry I1 and Fiscalyear FY-3
>>>>>>> weight is MKT1 / SUM(MKT, where Country = C1, Industry = I1,
>>>>>>> Fiscalyear
>>>>>>> FY-3)
>>>>>>> 
>>>>>>> 
>>>>>>> I tried using ddply function:
>>>>>>> ddply (dataread , .(Sector, FISCALYEAR), summarise, WROE=wavg(ROE,
>>>>>>> MKTCAP)))
>>>>>>> 
>>>>>>> where wavg <- function(x, wt) x %*% wt/sum(wt)
>>>>>>> but this doesn't give me the right answer.
>>>>>>> 
>>>>>>> I could try subseting the data into different sectors and compute
>>>>>>> the
>>>>>>> weighted average which doesn't look like an elegant solution and
>>>>>>> would
>>>>>>> defeat the purpose of ddply
>>>>>>> 
>>>>>>> I coudn't think of properly using melt and cast functions to solve
>>>>>>> this issue. Any help will be highly appreciated.
>>>>>>> 
>>>>>>> Thanks and Regards,
>>>>>>> Punit
>>>>>>> 
>>>>>>> ______________________________________________
>>>>>>> R-help at r-project.org mailing list
>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>>> PLEASE do read the posting guide
>>>>>>> http://www.R-project.org/posting-guide.html
>>>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>> 
>>>>>> ____________________________________________________________
>>>>>> GET FREE SMILEYS FOR YOUR IM & EMAIL - Learn more at
>>>>>> http://www.inbox.com/smileys
>>>>>> Works with AIM®, MSN® Messenger, Yahoo!® Messenger, ICQ®, Google
>>>>>> Talk™
>>>>>> and most webmails
>>>>>> 
>>>>>> 
>>>> 
>>>> ____________________________________________________________
>>>> FREE ONLINE PHOTOSHARING - Share your photos online with your friends
>>>> and family!
>>>> Visit http://www.inbox.com/photosharing to find out more!
>>>> 
>>>> 
>> 
>> ____________________________________________________________
>> FREE 3D MARINE AQUARIUM SCREENSAVER - Watch dolphins, sharks & orcas on
>> your desktop!
>> Check it out at http://www.inbox.com/marineaquarium
>> 
>>

____________________________________________________________
FREE ONLINE PHOTOSHARING - Share your photos online with your friends and family!
Visit http://www.inbox.com/photosharing to find out more!