[R] Joining uneven datasets

Peter Dalgaard p.dalgaard at biostat.ku.dk
Thu May 29 16:57:00 CEST 2008


Neil Gupta wrote:
> Hello,
>
> I have quite a simple problem that I believe can be solved quite easily. I
> have a dataframe as such:
> Symbol       Date     Time Exchange TickType ReferenceNumber Price Size
> 1 3:YMZ7.EC 12/03/2007 08:30:00       EC        B        83916044 13387    9
> 2 3:YMZ7.EC 12/03/2007 08:30:00       EC        A        83916045 13388    1
> 3 3:YMZ7.EC 12/03/2007 08:30:00       EC        B        83916054 13387    9
> 4 3:YMZ7.EC 12/03/2007 08:30:00       EC        A        83916055 13388    1
> 5 3:YMZ7.EC 12/03/2007 08:30:00       EC        B        83916533 13386   39
> 6 3:YMZ7.EC 12/03/2007 08:30:00       EC        A        83916534 13388    1
>
> I wanted the average of the B's and A's. I wrote this to perform that.
>  NPrice <-
> (YM1207$Price[which(YM1207$TickType=="B")]+YM1207$Price[which(YM1207$TickType=="A")])/2
>
> head(NPrice)
> [1] 13387.5 13387.5 13387.0 13386.5 13386.5 13387.0
>
> Now since NPrice is much smaller than the original dataframe, YM1207 I can
> not just add NPrice to the set.
> How can I put each of those averages back into their corresponding row? I
> would even prefer repeating the values for A's as well..
>
> I would like to do it as such..
>
> Symbol       Date     Time Exchange TickType ReferenceNumber Price Size
> NPrice
> 1 3:YMZ7.EC 12/03/2007 08:30:00       EC        B        83916044 13387
> 9   13387.5
> 2 3:YMZ7.EC 12/03/2007 08:30:00       EC        A        83916045 13388
> 1    13387.5
> 3 3:YMZ7.EC 12/03/2007 08:30:00       EC        B        83916054 13387
> 9    13387.5
> 4 3:YMZ7.EC 12/03/2007 08:30:00       EC        A        83916055 13388
> 1     13387.5
> 5 3:YMZ7.EC 12/03/2007 08:30:00       EC        B        83916533 13386
> 39   13387.0
> 6 3:YMZ7.EC 12/03/2007 08:30:00       EC        A        83916534 13388
> 1    13387.0
>
>   
What can be assumed here? If the alternating B,A pattern is consistent, 
I'd go for (something like)

N <- nrow(YM1207)
ix <- gl(N/2,2)
YM1207$NPrice <- ave(YM1207, ix)

-- 
   O__  ---- Peter Dalgaard             Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark      Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)              FAX: (+45) 35327907



More information about the R-help mailing list