[R] Computing stats on common parts of multiple dataframes

Erik Iverson iverson at biostat.wisc.edu
Tue Feb 13 21:42:21 CET 2007


Murali -

I've come up with something that might with work, with gratutious use of 
the *apply functions.  See ?apply, ?lappy, and ?mapply for how this 
would work.  Basically, just set my.list equal to a list of data.frames 
  you would like included.  I made this to work with matrices first, so 
it does use as.matrix() in my function.  Also, this could be turned into 
a general function so that you could specify a different function other 
than "median".

#Make my.list equal to a list of dataframes you want
my.list <- list(df1,df2)

#What's the shortest?
minrow <- min(sapply(my.list,nrow))
#Chop all to the shortest
tmp <- lapply(my.list, function(x) x[(nrow(x)-(minrow-1)):nrow(x),])

#Do the computation, could change median to mean, or a user defined
#function
matrix(apply(mapply("[",lapply(tmp,as.matrix), 
MoreArgs=list(1:(minrow*2))), 1, median),
        ncol=2)

HTH, whether or not this is any "better" than your for loop solution is 
left up to you.

Erik


Murali Menon wrote:
> Folks,
> 
> I have three dataframes storing some information about
> two currency pairs, as follows:
> 
> R> a
> 
> EUR-USD    NOK-SEK
> 1.23    1.33
> 1.22    1.43
> 1.26    1.42
> 1.24    1.50
> 1.21    1.36
> 1.26    1.60
> 1.29    1.44
> 1.25    1.36
> 1.27    1.39
> 1.23    1.48
> 1.22    1.26
> 1.24    1.29
> 1.27    1.57
> 1.21    1.55
> 1.23    1.35
> 1.25    1.41
> 1.25    1.30
> 1.23    1.11
> 1.28    1.37
> 1.27    1.23
> 
> 
> 
> R> b
> EUR-USD    NOK-SEK
> 1.23    1.22
> 1.21    1.36
> 1.28    1.61
> 1.23    1.34
> 1.21    1.22
> 
> 
> 
> R> d
> 
> EUR-USD    NOK-SEK
> 1.27    1.39
> 1.23    1.48
> 1.22    1.26
> 1.24    1.29
> 1.27    1.57
> 1.21    1.55
> 1.23    1.35
> 1.25    1.41
> 1.25    1.33
> 1.23    1.11
> 1.28    1.37
> 1.27    1.23
> 
> The twist is that these entries correspond to dates where the
> *last* rows in each frame are today's entries, and so on
> backwards in time.
> 
> I would like to create a matrix of medians (a median for each row
> and for each currency pair), but only for those rows where all
> dataframes have entries.
> 
> My answer in this case should look like:
> 
> EUR-USD    NOK-SEK
> 
> 1.25    1.41
> 1.25    1.33
> 1.23    1.11
> 1.28    1.37
> 1.27    1.23
> 
> where the last EUR-USD entry = median(1.27, 1.21, 1.27), etc.
> 
> Notice that the output is of the same dimensions as the smallest dataframe
> (in this case 'b').
> 
> I can do it in a clumsy fashion by first obtaining the number
> of rows in the smallest matrix, chopping off the top rows
> of the other matrices to reduce them this size, then doing a
> for-loop across each currency pair, row-wise, to create a
> 3-vector which I then apply median() on.
> 
> Surely there's a better way to do this?
> 
> Please advise.
> 
> Thanks,
> 
> Murali Menon
> 
> _________________________________________________________________
> Valentine’s Day -- Shop for gifts that spell L-O-V-E at MSN Shopping
> 
> 
> ------------------------------------------------------------------------
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list