[R] Computing stats on common parts of multiple dataframes

Gabor Grothendieck ggrothendieck at gmail.com
Tue Feb 13 22:25:49 CET 2007


Sorry, I switched variable names part way through.  Here it is again:


DFs <- list(DF1, DF2, DF3)
n <- min(sapply(DFs, nrow))
DFs <- lapply(DFs, tail, n)
mats <- lapply(DFs, as.matrix)
pmedian <- function(...) median(c(...))
medians <- do.call(mapply, c(pmedian, mats))
replace(DFs[[1]], TRUE, medians)


On 2/13/07, Gabor Grothendieck <ggrothendieck at gmail.com> wrote:
> Suppose our data frames are called DF1, DF2 and DF3.  Then
> find the least number of rows, n, among them.  Create a
> list, DFs, of the last n rows of the data frames and another
> list, mats, which is the same but in which each component is a
> matrix.  Create a parallel median function, pmedian, analogous
> to pmax and mapply it to the matrices.  Finally replace that
> back into a data frame.
>
> n <- min(sapply(L, nrow))
> DFs <- lapply(list(DF1, DF2, DF3), tail, n)
> mats <- lapply(DFs, as.matrix)
> pmedian <- function(...) median(c(...))
> medians <- do.call(mapply, c(pmedian, mats))
> replace(DFs[[1]], TRUE, medians)
>
>
> On 2/13/07, Murali Menon <feanor0 at hotmail.com> wrote:
> > Folks,
> >
> > I have three dataframes storing some information about
> > two currency pairs, as follows:
> >
> > R> a
> >
> > EUR-USD NOK-SEK
> > 1.23    1.33
> > 1.22    1.43
> > 1.26    1.42
> > 1.24    1.50
> > 1.21    1.36
> > 1.26    1.60
> > 1.29    1.44
> > 1.25    1.36
> > 1.27    1.39
> > 1.23    1.48
> > 1.22    1.26
> > 1.24    1.29
> > 1.27    1.57
> > 1.21    1.55
> > 1.23    1.35
> > 1.25    1.41
> > 1.25    1.30
> > 1.23    1.11
> > 1.28    1.37
> > 1.27    1.23
> >
> >
> >
> > R> b
> > EUR-USD NOK-SEK
> > 1.23    1.22
> > 1.21    1.36
> > 1.28    1.61
> > 1.23    1.34
> > 1.21    1.22
> >
> >
> >
> > R> d
> >
> > EUR-USD NOK-SEK
> > 1.27    1.39
> > 1.23    1.48
> > 1.22    1.26
> > 1.24    1.29
> > 1.27    1.57
> > 1.21    1.55
> > 1.23    1.35
> > 1.25    1.41
> > 1.25    1.33
> > 1.23    1.11
> > 1.28    1.37
> > 1.27    1.23
> >
> > The twist is that these entries correspond to dates where the
> > *last* rows in each frame are today's entries, and so on
> > backwards in time.
> >
> > I would like to create a matrix of medians (a median for each row
> > and for each currency pair), but only for those rows where all
> > dataframes have entries.
> >
> > My answer in this case should look like:
> >
> > EUR-USD NOK-SEK
> >
> > 1.25    1.41
> > 1.25    1.33
> > 1.23    1.11
> > 1.28    1.37
> > 1.27    1.23
> >
> > where the last EUR-USD entry = median(1.27, 1.21, 1.27), etc.
> >
> > Notice that the output is of the same dimensions as the smallest dataframe
> > (in this case 'b').
> >
> > I can do it in a clumsy fashion by first obtaining the number
> > of rows in the smallest matrix, chopping off the top rows
> > of the other matrices to reduce them this size, then doing a
> > for-loop across each currency pair, row-wise, to create a
> > 3-vector which I then apply median() on.
> >
> > Surely there's a better way to do this?
> >
> > Please advise.
> >
> > Thanks,
> >
> > Murali Menon
> >
> > _________________________________________________________________
> > Valentine's Day -- Shop for gifts that spell L-O-V-E at MSN Shopping
> >
> >
> > ______________________________________________
> > R-help at stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
> >
>



More information about the R-help mailing list