[R] ?max (so far...)

Duncan Murdoch murdoch at stats.uwo.ca
Wed Jul 1 21:54:39 CEST 2009


On 01/07/2009 1:26 PM, Mark Knecht wrote:
> On Wed, Jul 1, 2009 at 9:39 AM, Duncan Murdoch<murdoch at stats.uwo.ca> wrote:
>> On 01/07/2009 11:49 AM, Mark Knecht wrote:
>>> Hi,
>>>   I have a data.frame that is date ordered by row number - earliest
>>> date first and most current last. I want to create a couple of new
>>> columns that show the max and min values from other columns *so far* -
>>> not for the whole data.frame.
>>>
>>>   It seems this sort of question is really coming from my lack of
>>> understanding about how R intends me to limit myself to portions of a
>>> data.frame. I get the impression from the help files that the generic
>>> way is that if I'm on the 500th row of a 1000 row data.frame and want
>>> to limit the search max does to rows 1:500  I should use something
>>> like [1:row] but it's not working inside my function. The idea works
>>> outside the function, in the sense I can create tempt1[1:7] and the
>>> max function returns what I expect. How do I do this with row?
>>>
>>>   Simple example attached. hp should be 'highest p', ll should be
>>> 'lowest l'. I get an error message "Error in 1:row : NA/NaN argument"
>>>
>>> Thanks,
>>> Mark
>>>
> <SNIP>
>>> HighLow = function (MyFrame) {
>>>        temp1 <- MyFrame$p[1:row]
>>>        MyFrame$hp <- max(temp1) ## Highest p
>>>        temp1 <- MyFrame$l[1:row]
>>>        MyFrame$ll <- min(temp1) ## Lowest l
>>>
>>>        return(MyFrame)
>>> }
>> You get an error in this function because you didn't define row, so R
>> assumes you mean the function in the base package, and 1:row doesn't make
>> sense.
>>
>> What you want for the "highest so far" is the cummax (for "cumulative
>> maximum") function.  See ?cummax.
>>
>> Duncan Murdoch
>>
> 
> Duncon,
>    OK, thanks. That makes sense, as long as I want the cummax from the
> beginning of the data.frame. (Which is exactly what I asked for!)
> 
>    How would I do this in the more general case if I was looking for
> the cummax of only the most recent 50 rows in my data.frame? What I'm
> trying to get down to is that as I fill in my data.frame I need to be
> able get a max or min or standard deviation of the previous so many
> rows of data - not the whole column - and I'm just not grasping how to
> do this. Is seems like I should be able to create a data set that's
> only a portion of a column while I'm in the function and then take the
> cummax on that, or use it as an input to a standard deviation, etc.?

What you describe might be called a "running max".  The caTools package 
has a runmax function that probably does what you want.

More generally, you can always write a loop.  They aren't necesssrily 
fast or elegant, but they're pretty general.  For example, to calculate 
the max of the previous 50 observations (or fewer near the start of a 
vector), you could do

x <- ... some vector ...

result <- numeric(length(x))
for (i in seq_along(x)) {
   result[i] <- max( x[ max(1, i-49):i ])
}

Duncan Murdoch




More information about the R-help mailing list