[R] Computing stats on common parts of multiple dataframes

Murali Menon feanor0 at hotmail.com
Wed Feb 14 10:32:17 CET 2007


Hi Gabor, Eric,

Your suggestions are certainly more elegant (and involve less typing) than 
my for-loop. Thanks!

Patrick, I see what you mean regarding dates. The problem is that I'm doing 
all sorts of manipulations on the original data, which work best on numeric 
matrices, and the dates just mess it all up. I could, of course, reintroduce 
the date columns for the issue at hand, and do an intersect.

Best wishes,
Murali



>From: Erik Iverson <iverson at biostat.wisc.edu>
>To: Murali Menon <feanor0 at hotmail.com>
>CC: r-help at stat.math.ethz.ch
>Subject: Re: [R] Computing stats on common parts of multiple dataframes
>Date: Tue, 13 Feb 2007 14:42:21 -0600
>
>Murali -
>
>I've come up with something that might with work, with gratutious use of 
>the *apply functions.  See ?apply, ?lappy, and ?mapply for how this would 
>work.  Basically, just set my.list equal to a list of data.frames  you 
>would like included.  I made this to work with matrices first, so it does 
>use as.matrix() in my function.  Also, this could be turned into a general 
>function so that you could specify a different function other than 
>"median".
>
>#Make my.list equal to a list of dataframes you want
>my.list <- list(df1,df2)
>
>#What's the shortest?
>minrow <- min(sapply(my.list,nrow))
>#Chop all to the shortest
>tmp <- lapply(my.list, function(x) x[(nrow(x)-(minrow-1)):nrow(x),])
>
>#Do the computation, could change median to mean, or a user defined
>#function
>matrix(apply(mapply("[",lapply(tmp,as.matrix), 
>MoreArgs=list(1:(minrow*2))), 1, median),
>        ncol=2)
>
>HTH, whether or not this is any "better" than your for loop solution is 
>left up to you.
>
>Erik
>
>
>Murali Menon wrote:
>>Folks,
>>
>>I have three dataframes storing some information about
>>two currency pairs, as follows:
>>
>>R> a
>>
>>EUR-USD    NOK-SEK
>>1.23    1.33
>>1.22    1.43
>>1.26    1.42
>>1.24    1.50
>>1.21    1.36
>>1.26    1.60
>>1.29    1.44
>>1.25    1.36
>>1.27    1.39
>>1.23    1.48
>>1.22    1.26
>>1.24    1.29
>>1.27    1.57
>>1.21    1.55
>>1.23    1.35
>>1.25    1.41
>>1.25    1.30
>>1.23    1.11
>>1.28    1.37
>>1.27    1.23
>>
>>
>>
>>R> b
>>EUR-USD    NOK-SEK
>>1.23    1.22
>>1.21    1.36
>>1.28    1.61
>>1.23    1.34
>>1.21    1.22
>>
>>
>>
>>R> d
>>
>>EUR-USD    NOK-SEK
>>1.27    1.39
>>1.23    1.48
>>1.22    1.26
>>1.24    1.29
>>1.27    1.57
>>1.21    1.55
>>1.23    1.35
>>1.25    1.41
>>1.25    1.33
>>1.23    1.11
>>1.28    1.37
>>1.27    1.23
>>
>>The twist is that these entries correspond to dates where the
>>*last* rows in each frame are today's entries, and so on
>>backwards in time.
>>
>>I would like to create a matrix of medians (a median for each row
>>and for each currency pair), but only for those rows where all
>>dataframes have entries.
>>
>>My answer in this case should look like:
>>
>>EUR-USD    NOK-SEK
>>
>>1.25    1.41
>>1.25    1.33
>>1.23    1.11
>>1.28    1.37
>>1.27    1.23
>>
>>where the last EUR-USD entry = median(1.27, 1.21, 1.27), etc.
>>
>>Notice that the output is of the same dimensions as the smallest dataframe
>>(in this case 'b').
>>
>>I can do it in a clumsy fashion by first obtaining the number
>>of rows in the smallest matrix, chopping off the top rows
>>of the other matrices to reduce them this size, then doing a
>>for-loop across each currency pair, row-wise, to create a
>>3-vector which I then apply median() on.
>>
>>Surely there's a better way to do this?
>>
>>Please advise.
>>
>>Thanks,
>>
>>Murali Menon
>>
>>_________________________________________________________________
>>Valentine’s Day -- Shop for gifts that spell L-O-V-E at MSN Shopping
>>
>>
>>------------------------------------------------------------------------
>>
>>______________________________________________
>>R-help at stat.math.ethz.ch mailing list
>>https://stat.ethz.ch/mailman/listinfo/r-help
>>PLEASE do read the posting guide 
>>http://www.R-project.org/posting-guide.html
>>and provide commented, minimal, self-contained, reproducible code.

_________________________________________________________________
Turn searches into helpful donations. Make your search count.



More information about the R-help mailing list