[R] Best and worst values for each date

Thu Sep 26 00:35:51 CEST 2013

Ira,

You may try also with ?ddply()

dat2<- data.frame(S1=rep(Pred1[,1],ncol(Pred1)-1),variable=rep(colnames(Pred1)[-1],each=nrow(Pred1)),Predict=unlist(Pred1[,-1],use.names=FALSE),Actual=unlist(Actual1[,-1],use.names=FALSE),stringsAsFactors=FALSE)
 identical(dat,dat2)
#[1] TRUE
dat2New<- dat2[!(is.na(dat2$Predict)|is.na(dat2$Actual)),]
 dat3<- dat2New[order(dat2New$S1,dat2New$Predict),]
library(plyr)
 res2<- ddply(dat3,.(S1),summarize, cbind(c(head(rev(Predict),5),head(Predict,5)),c(head(rev(Actual),5),head(Actual,5)))) #in the example data this works
res2New<- data.frame(S1=res2[,1],Predict=res2[,2][,1],Actual=res2[,2][,2])
 res3<- res2New[res2New$Predict!=0,] 
row.names(res3)<- 1:nrow(res3)
 identical(res3,res[,-2])
#[1] TRUE

But, if you have fewer number of positive or negative values, then the loop method or trying individually with ?ddply would be appropriate.
A.K.

----- Original Message -----
From: arun <smartpink111 at yahoo.com>
To: Ira Sharenow <irasharenow100 at yahoo.com>
Cc: R help <r-help at r-project.org>
Sent: Wednesday, September 25, 2013 4:24 PM
Subject: Re: Best and worst values for each date

Hi,
May be you can try this:

obj_name<- load("arun.RData")
Pred1<- get(obj_name[1])
Actual1<- get(obj_name[2])
library(reshape2)
dat<-cbind(melt(Pred1,id.vars="S1"),value2=melt(Actual1,id.vars="S1")[,3])  # to reshape to long form
colnames(dat)[3:4]<- c("Predict","Actual")
dat$variable<- as.character(dat$variable) #not that needed
dat1<-  dat[!(is.na(dat$Predict)|is.na(dat$Actual)),] # removes the NA values in columns "Predict" and "Actual"

res<- do.call(rbind,lapply(split(dat1,dat1$S1),function(x){x1<-x[order(x$Predict),]

                                      xlow<-if(sum(x1$Predict<0) <5){  #in cases where you don't have 5 negative numbers

                                                 x1[x1$Predict<0,]
                                                }
                                             else  {
                                            x1[x1$Predict<0,][1:5,]  # select first five rows     

                                               }
                                           xhigh<- if(sum(x1$Predict>0) <5){ #not having 5 postive numbers

                                                  x1[x1$Predict>0,]}
                                                  else {
                                                    tail(x1[x1$Predict>0,],5)

                                                       }   

                     rbind(xhigh[rev(order(xhigh$Predict)),],xlow)}))  ##reverse the order of high values 
 dim(res)
#[1] 480   4

A.K.

________________________________
From: Ira Sharenow <irasharenow100 at yahoo.com>
To: arun <smartpink111 at yahoo.com> 
Sent: Wednesday, September 25, 2013 12:55 PM
Subject: Best and worst values for each date

Arun,

I hope you have been doing well.

I have a new problem.

I have two data frames, one for predictions and one for the actual returns.

Each day I act on the returns that have the 5 highest values and the five lowest values. I then want to compare to the actual values. So I need to subset my two original data frames so that the stocks and their prices that remain after each day are the ones I want. At the end of filtering there will be one data frame for predictions and one data frame for actual values.

Now for an enhancement. NA values cannot be part of the reduced data frames but will occur in great proportion in the original data frames. Each day I need to check that the top five are positive; otherwise I need to reduce that number as needed. Similarly I need for the bottom five are negative. At the end of 50 days each original data frame will have 5 * 2 * 50 = 500 rows, but this step may reduce that number.

I attached a smallish file with the two data frames. The real ones have hundreds of columns and over 1,000 rows.

Please aim for simplicity. If the solution is complex, please explain.

Do you want me to use a different email address?

Thanks.

Ira

Example. But the stocks are not set up this way.

The highlighted stocks are in the first data frames.

Predict Actual 
1/3/2006 S1 3 -1.943 
1/3/2006 S20 4 10.376 
1/3/2006 S3 2 8.611 
1/3/2006 S4 1 7.465 
1/3/2006 S5 0 1.648 
1/3/2006 S6 -1 5.36 
1/3/2006 S7 -2 4.36 
1/3/2006 S8 -3 3.574 
1/3/2006 S9 -4 2.748 
1/3/2006 S10 -5 1.933 
1/3/2006 S11 -6 0.548 
1/3/2006 S12 -7 -0.66 
1/3/2006 S13 -8 -1.793 
1/3/2006 S14 -9 -2.163 
1/3/2006 S15 -10 -3.077 
1/3/2006 S16 -11 -4.723 
1/3/2006 S17 -12 -5.919 
1/3/2006 S18 -13 -6.529 
1/3/2006 S19 -14 -7.979 
1/3/2006 S20 -15 -8.064 

After making sure only positives are in for top 5 predictions and only negatives for the bottom 5 predictions
1/3/2006 S1 3 -1.943 
1/3/2006 S20 4 10.376 
1/3/2006 S3 2 8.611 
1/3/2006 S4 1 7.465 
1/3/2006 S16 -11 -4.723 
1/3/2006 S17 -12 -5.919 
1/3/2006 S18 -13 -6.529 
1/3/2006 S19 -14 -7.979 
1/3/2006 S20 -15 -8.064 

Note that the next day different stocks may be selected. Also there cannot any NA in either the Predict or Actual columns.