[R] Create a new column based on values in two other columns

arun smartpink111 at yahoo.com
Mon Sep 9 00:43:09 CEST 2013


HI,
df$NewPrices<- unsplit(lapply(split(df,df$Stocks),function(x) {do.call(rbind,lapply(seq_len(nrow(x)),function(i) {if(x[i,]$Offsets==2)
                                         x[i+2,]$Prices
                                        else if(x[i,]$Offsets==1)
                                         x[i+1,]$Prices
                                         else x[i,]$Prices
                                        })) }),df$Stocks)


 df$NewPrices
#[1] 13 17 12 16 17 18 16 17 18

sqldf(Q2)[,1]
#[1] 13 17 12 16 17 18 16 17 18


I think sqldf() would be faster.


A.K.



----- Original Message -----
From: Ira Sharenow <irasharenow100 at yahoo.com>
To: r-help at r-project.org
Cc: 
Sent: Saturday, September 7, 2013 4:11 PM
Subject: [R] Create a new column based on values in two other columns

I am trying to add a column to a data frame. Each day for each stock I 
make a prediction for a future date. Then I need to compare my 
predictions to the actual values. So looking at the first row of data:

For Stock A on 2011-01-01 I predicted that on 2011-01-02 the price would 
be 10.25.

Now I need an ActualPrices column. The first value should be 13.

I solved the problem using sqldf, but I would appreciate some advice on 
how to solve the problem using standard R techniques. The real data 
frame has over 100,000 rows.

I know that the conditions for the correct row can be found in the WHERE 
clause of the SQL query and then I need to look in the Prices column to 
get the value, but I do not know how to do that in standard R.

If another library would be easier, I am open to other ideas.

Dates = as.Date(c(rep("2011-01-01",3), rep("2011-01-02",3), 
rep("2011-01-03",3) ), "%Y-%m-%d")

Stocks = rep(c("A", "B", "C"), 3)

Offsets = c(1,2,0,1,1,1,0,0,0)

Prices = 10:18

PredPrices = 10:18 + 0.25

df = data.frame(Stocks, Dates, Offsets, Prices, PredPrices )

df$NewDates = df$Dates + df$Offsets

df

StocksDates Offsets Prices PredPricesNewDates

1A 2011-01-0111010.25 2011-01-02

2B 2011-01-0121111.25 2011-01-03

3C 2011-01-0101212.25 2011-01-01

4A 2011-01-0211313.25 2011-01-03

5B 2011-01-0211414.25 2011-01-03

6C 2011-01-0211515.25 2011-01-03

7A 2011-01-0301616.25 2011-01-03

8B 2011-01-0301717.25 2011-01-03

9C 2011-01-0301818.25 2011-01-03

library(sqldf)

# To see everything in this small example

Q1 = "SELECT df1.Stocks, df1.Dates, df1.Offsets, df1.Prices, 
df1.PredPrices, df2.Prices AS NewPrices

FROM df AS df1, df AS df2

WHERE df1.NewDates = df2.Dates AND df1.Stocks = df2.Stocks";

sqldf(Q1)

# To get the column. This what I really want

Q2 = "SELECT df2.Prices AS NewPrices

FROM df AS df1, df AS df2

WHERE df1.NewDates = df2.Dates AND df1.Stocks = df2.Stocks";

sqldf(Q2)

As I will need to reshape my data so that each row is for a specific 
date, a second starting point is this data frame.

dfWide= reshape(df, direction = "wide", idvar = "Dates", timevar = "Stocks")

> dfWide

Dates Offsets.A Prices.A PredPrices.A NewDates.A Offsets.B Prices.B 
PredPrices.B NewDates.B Offsets.C Prices.C PredPrices.C NewDates.C

1 2011-01-0111010.25 2011-01-0221111.25 2011-01-0301212.25 2011-01-01

4 2011-01-0211313.25 2011-01-0311414.25 2011-01-0311515.25 2011-01-03

7 2011-01-0301616.25 2011-01-0301717.25 2011-01-0301818.25 2011-01-03


    [[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list