[R] Problems with Merge

Patnaik, Tirthankar tirthankar.patnaik at citi.com
Tue Jun 5 13:56:23 CEST 2007


Hi,
 I have a history dataset, a matrix with about 1590 obs, and 242 cols,
and I need to update this matrix with an 'update' matrix that has about
30 rows, and roughly similar number of columns as the history ds (but
not necessarily equal). The update dataset is read from an Excel ODBC
connection. When I try and merge these datasets, I get counter-intuitive
results.

library(RODBC)
   chn <- odbcConnectExcel(UpdateFile)
   sqlTables(chn)
   UpdateData <- sqlFetch(chn,"MCap243")
   colnames(UpdateData) <- gsub("#",".",colnames(UpdateData))
   close(chn)
   # specify just how many rows we need from the Update file. We'd only
read five
   # rows at a time.
   UpdateRows = 20
   UpdateData <- UpdateData[1:UpdateRows,]

   # Delete Unwanted stocks.
   UpdateData <- UpdateData[,!names(UpdateData) %in% ToBeDeleted]
   x <- tail(UpdateData[c("Date","ABAN.BO")],n=50)
   print(x)
   

Gives x as:

         Date  ABAN.BO
1  2007-04-30 96448.40
2  2007-05-01 96448.40
3  2007-05-02 96448.40
4  2007-05-03 96300.44
5  2007-05-04 93718.52
6  2007-05-05 93718.52
7  2007-05-06 93718.52
8  2007-05-07 92743.82
9  2007-05-08 90374.60
10 2007-05-09 89126.18
11 2007-05-10 87082.47
12 2007-05-11 85493.73
13 2007-05-12 85493.73
14 2007-05-13 85493.73
15 2007-05-14 85033.21
16 2007-05-15 89209.41
17 2007-05-16 89089.19
18 2007-05-17 90472.62
19 2007-05-18 90326.51
20 2007-05-19 90326.51

But when I merge this file with the history dataset, I get the dates
misaligned by one row.

whistory <- merge(rhistory,UpdateData,by.x=rhistory["Date"],all=TRUE)

tail(WHist4[c("Date","ABAN.BO")],n=30)

          Date  ABAN.BO
1581 2007-04-19 83632.60
1582 2007-04-20 85942.00
1583 2007-04-23 88244.00
1584 2007-04-24 90309.50
1585 2007-04-25 92048.00
1586 2007-04-26 92051.70
1587 2007-04-27 95863.10
1588 2007-04-29 96448.40
1589 2007-04-30 96448.40
1590 2007-04-30 96343.40
1591 2007-05-01 96448.40
1592 2007-05-02 96300.44
1593 2007-05-03 93718.52
1594 2007-05-03 96195.60
1595 2007-05-04 93718.52
1596 2007-05-04 93616.50
1597 2007-05-05 93718.52
1598 2007-05-06 92743.82
1599 2007-05-07 90374.60
1600 2007-05-08 89126.18
1601 2007-05-09 87082.47
1602 2007-05-10 85493.73
1603 2007-05-11 85493.73
1604 2007-05-12 85493.73
1605 2007-05-13 85033.21
1606 2007-05-14 89209.41
1607 2007-05-15 89089.19
1608 2007-05-16 90472.62
1609 2007-05-17 90326.51
1610 2007-05-18 90326.51

Any reasons why the dates are shifted by one date? Am I missing some
parameters in the merge statement?

TIA and best,
-Tir



More information about the R-help mailing list