[R] Securities earning covariance

Thu Jun 5 17:56:24 CEST 2008

Replace cov(z) with cov(z, use = "pair") is there are missing values
as there are here.

On Thu, Jun 5, 2008 at 11:54 AM, Gabor Grothendieck
<ggrothendieck at gmail.com> wrote:
> Check out the three vignettes (i.e. pdf documents in the zoo package). e.g.
>
>
> Lines <- "SEC_ID          DAY             EARNING
> IT0000001       20070101        5.467
> IT0000001       20070102        5.456
> IT0000001       20070103        4.954
> IT0000001       20070104        3.456
> IT0000002       20070101        1.456
> IT0000002       20070102        1.345
> IT0000002       20070103        1.233
> IT0000003       20070101        0.345
> IT0000003       20070102        0.367
> IT0000003       20070103        0.319
> "
> DF <- read.table(textConnection(Lines), header = TRUE)
> DFs <- split(DF, DF$SEC_ID)
>
> library(zoo)
> f <- function(DF.) zoo(DF.$EARNING, as.Date(format(DF.$DAY), "%Y%m%d"))
> z <- do.call(merge, lapply(DFs, f))
> cov(z) # uses n-1
>
>
> On Thu, Jun 5, 2008 at 11:41 AM,  <ANGELO.LINARDI at bancaditalia.it> wrote:
>> Good morning,
>>
>> I am a new R user and I am trying to learn how to use it.
>> I am trying to solve this problem.
>> I have a dataframe df of daily securities (for a year) earnings as
>> follows:
>>
>> SEC_ID          DAY             EARNING
>> IT0000001       20070101        5.467
>> IT0000001       20070102        5.456
>> IT0000001       20070103        4.954
>> IT0000001       20070104        3.456
>>                    ..........................
>> IT0000002       20070101        1.456
>> IT0000002       20070102        1.345
>> IT0000002       20070103        1.233
>>                ..........................
>> IT0000003       20070101        0.345
>> IT0000003       20070102        0.367
>> IT0000003       20070103        0.319
>>                ..........................
>>
>> And so on: about 800 different SEC_ID and about 180000 rows.
>> I have to calculate the "covariance" for each couple of securities x and
>> y according to the formula:
>>
>> Cov(x,y) = (sum[(x-x')*(y-y')]/N)/(sx*sy)
>>
>> being x' and y' the mean of securities earning in the year, N the number
>> of observations, sx and sy the standard deviation of x and y.
>> To do this I could build a df2 data frame like this:
>>
>> DAY             SEC_ID.x        SEC_ID.y        EARNING.x
>> EARNING.y       x'      y'      sx      sy
>> 20070101        IT0000001       IT0000002       5.467           1.456
>> a       b       aa      bb
>> 20070101        IT0000001       IT0000003       5.467           0.345
>> a       c       aa      cc
>> 20070101        IT0000002       IT0000003       1.456           0.345
>> b       c       bb      cc
>> 20070102        IT0000001       IT0000002       5.456           1.345
>> a       b       aa      bb
>> 20070102        IT0000001       IT0000003       5.456           0.367
>> a       c       aa      cc
>> 20070102        IT0000002       IT0000003       1.345           0.367
>> b       c       bb      cc
>> ........................................................................
>> .......................................................
>>
>> (merging df with itself with a condition SEC_ID.x < SEC_ID.y) and then
>> easily calculate the formula; but the dimensions are too big (the
>> process stops whit an out-of-memory message).
>> Besides partitioning the input and using a loop, are there any smarter
>> solutions (eventually using split and other ways of "subgroup merging"
>> to solve the problem ?
>> Are there any "shortcuts" using statistical built-in functions (e.g.
>> cov, vcov) ?
>> Thank you in advance
>>
>> Angelo Linardi
>>
>>
>>
>> ** Le e-mail provenienti dalla Banca d'Italia sono trasmesse in buona fede e non
>> comportano alcun vincolo ne' creano obblighi per la Banca stessa, salvo che cio' non
>> sia espressamente previsto da un accordo scritto.
>> Questa e-mail e' confidenziale. Qualora l'avesse ricevuta per errore, La preghiamo di
>> comunicarne via e-mail la ricezione al mittente e di distruggerne il contenuto. La
>> informiamo inoltre che l'utilizzo non autorizzato del messaggio o dei suoi allegati
>> potrebbe costituire reato. Grazie per la collaborazione.
>> -- E-mails from the Bank of Italy are sent in good faith but they are neither binding on
>> the Bank nor to be understood as creating any obligation on its part except where
>> provided for in a written agreement. This e-mail is confidential. If you have received it
>> by mistake, please inform the sender by reply e-mail and delete it from your system.
>> Please also note that the unauthorized disclosure or use of the message or any
>> attachments could be an offence. Thank you for your cooperation. **
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>