[R] Conditional Sums for Index creation

Mon May 14 11:10:15 CEST 2007

All,
	I guess the idea is identify the number of NAs na(r) in
particular row r, then calculate the 'original' row-sum ors(r) based on
the info on the set of non-missings in the previous row, i.e., na(r-1),
right? 

Gaurav, I'd assume that the data is always for a stock, and the decision
to include a stock in the portfolio would be mine.

# Indexes of all the non-missing in all the rows
ad <- apply(ac,1,function(y)which(!is.na(y)))

Now here, if I try a row-by-row evaluation: 

> sum(ac[6,unlist(ad[5])],na.rm=TRUE)
[1] 208
> 
> sum(ac[11,unlist(ad[10])],na.rm=TRUE)
[1] 130
> 
> 

I get the answer I need! Works exactly well (Please see my mail below
for the expected answer. 151 in row=11 was incorrect, the right answer
is indeed 130. 

However, if I try and use this for all the rows in the data.frame:

ae <- ac

for (r in 2:nrow(ac)){
	print(r)
	ae$sumCorr <- unlist(sum(ac[r,unlist(ad[r-1])],na.rm=TRUE))
}

> head(ae)
  V1 V2 V3 V4 V5 sumCorr
1 27 93 82 NA NA     316
2 37 21 65 NA NA     316
3 57 65 78 NA NA     316
4 91 13 55 NA NA     316
5 20 27 53 NA NA     316
6 90 39 79 26 NA     316
> 

Which is obviously not right. The question is why does ae$sumCorr not
update with the change in row.  IMHO if I have this, then the solution
is nigh! :)

TIA and best,
-Tir

-----Original Message-----
From: Petr PIKAL [mailto:petr.pikal at precheza.cz] 
Sent: Monday, May 14, 2007 2:21 PM
To: Patnaik, Tirthankar [GWM-CIR]
Cc: r-help at stat.math.ethz.ch
Subject: Odp: [R] Conditional Sums for Index creation

Hi

there probably is not an easy out of box solution. Some use can be in
rle function e.g.

> apply(apply(ac,2,function(x) (is.na(x))),2, function(x)
rle(x)$values[1])
   V1    V2    V3    V4    V5 
FALSE FALSE FALSE  TRUE  TRUE 

gives you columns which start with NA and

> apply(apply(ac,2,function(x) (is.na(x))),2, function(x)
rle(x)$lengths[1])
V1 V2 V3 V4 V5
 9  6  6  5 10 
>

gives you length of NA consecutive sequence in each column. Then you can

elaborate a condition which will compute rowSums differently for a 
particular row. E.g. when there is a TRUE value take a number for that 
given column and increase it by one. You will get row number for which
you 
need rowSum with and without this particular column.

Regards

Petr Pikal
petr.pikal at precheza.cz

r-help-bounces at stat.math.ethz.ch napsal dne 14.05.2007 08:23:01:

> Hi,
>    Apologies for the long mail. I have a data.frame with columns of
> price/mcap data for a portfolio of stocks, and the date. To get the
> total value of the portfolio on a daily basis, I calculate rowSums of
> the data.frame. 
> 
> > set.seed(1)
> > ab <- matrix(round(runif(100)*100),nrow=20,ncol=5)
> > ab[1:5,4:5] <- NA
> > ab[6:10,5] <- NA
> > ac <- as.data.frame(ifelse(ab <= 7,NA,ab))
> > ac
>    V1 V2 V3 V4 V5
> 1  27 93 82 NA NA
> 2  37 21 65 NA NA
> 3  57 65 78 NA NA
> 4  91 13 55 NA NA
> 5  20 27 53 NA NA
> 6  90 39 79 26 NA
> 7  94 NA NA 48 NA
> 8  66 38 48 77 NA
> 9  63 87 73  8 NA
> 10 NA 34 69 88 NA
> 11 21 48 48 34 24
> 12 18 60 86 84 NA
> 13 69 49 44 35 64
> 14 38 19 24 33 88
> 15 77 83 NA 48 78
> 16 50 67 10 89 80
> 17 72 79 32 86 46
> 18 99 11 52 39 41
> 19 38 72 66 78 81
> 20 78 41 41 96 60
> > 
> 
> Here the rows 1:20 are dates (also in my data.frame). 
> 
> Since some of the prices have NA, the rowSums is made to ignore these
> entries. 
> 
> > rowSums(ac,na.rm=TRUE)
>   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17
18
> 19  20 
> 202 123 200 159 100 234 142 229 231 191 175 248 261 202 286 296 315
242
> 335 316 
> > 
> 
> Stocks are being added to the portfolio too. So from date=6 (or row=6)
> we have the 4th stock V4, and from date=11, we have the 5th stock V5.
My
> problem is that I need to calculate the rowSums for row=6 (When a new
> stock was added), _with_ and _without_ the new stock. So my answer for
> row=6 would be 234 for the plain row-sum, and 234 - 26 = 208 for the
> original set of stocks (without V4). Similarly, my answer for row=11
> would be 175 for the plain sum, and 175 - 24 = 151 for the original
sum
> (without V5). 
> 
> Basically I'm interested in finding out the value of the portfolio
with
> and without the new stock for the purposes of creating an index. It's
> possible that some stocks my get dropped later, in which case there
> would be an NA series starting for say V1 at row=18 and so on. In that
> case, the aim would be to find the sum at row=18 with and without the
> value of V1. 
> 
> Is there any way I can get the sum over columns, deleting specific
> colums? To get the columns that are NA in any row, I tried (shown for
> the first 12 rows):
> 
> > apply(ac[1:12,],1,function(y)which(is.na(y)))
> 
> Which correctly gives 
> 
> $`1`
> V4 V5 
>  4  5 
> 
> $`2`
> V4 V5 
>  4  5 
> 
> $`3`
> V4 V5 
>  4  5 
> 
> $`4`
> V4 V5 
>  4  5 
> 
> $`5`
> V4 V5 
>  4  5 
> 
> $`6`
> V5 
>  5 
> 
> $`7`
> V2 V3 V5 
>  2  3  5 
> 
> $`8`
> V5 
>  5 
> 
> $`9`
> V5 
>  5 
> 
> $`10`
> V1 V5 
>  1  5 
> 
> $`11`
> integer(0)
> 
> $`12`
> V5 
>  5 
> 
> > 
> 
> But now I'm stuck. I don't how to use this list of indices at each row
> to exclude my columns. 
> 
> Any pointers please? Would such an exercise be easier if I use a
> time-series based object, like a zoo.
> 
> 
> TIA and best,
> -Tir
> 
> Tirthankar Patnaik
> India Strategy
> Citigroup Investment Research
> +91-22-6631 9887
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.