[R] ?max (so far...)

Mark Knecht markknecht at gmail.com
Wed Jul 1 22:26:26 CEST 2009


On Wed, Jul 1, 2009 at 12:54 PM, Duncan Murdoch<murdoch at stats.uwo.ca> wrote:
> On 01/07/2009 1:26 PM, Mark Knecht wrote:
>>
>> On Wed, Jul 1, 2009 at 9:39 AM, Duncan Murdoch<murdoch at stats.uwo.ca>
>> wrote:
>>>
>>> On 01/07/2009 11:49 AM, Mark Knecht wrote:
>>>>
>>>> Hi,
>>>>  I have a data.frame that is date ordered by row number - earliest
>>>> date first and most current last. I want to create a couple of new
>>>> columns that show the max and min values from other columns *so far* -
>>>> not for the whole data.frame.
>>>>
>>>>  It seems this sort of question is really coming from my lack of
>>>> understanding about how R intends me to limit myself to portions of a
>>>> data.frame. I get the impression from the help files that the generic
>>>> way is that if I'm on the 500th row of a 1000 row data.frame and want
>>>> to limit the search max does to rows 1:500  I should use something
>>>> like [1:row] but it's not working inside my function. The idea works
>>>> outside the function, in the sense I can create tempt1[1:7] and the
>>>> max function returns what I expect. How do I do this with row?
>>>>
>>>>  Simple example attached. hp should be 'highest p', ll should be
>>>> 'lowest l'. I get an error message "Error in 1:row : NA/NaN argument"
>>>>
>>>> Thanks,
>>>> Mark
>>>>
>> <SNIP>
>>>>
>>>> HighLow = function (MyFrame) {
>>>>       temp1 <- MyFrame$p[1:row]
>>>>       MyFrame$hp <- max(temp1) ## Highest p
>>>>       temp1 <- MyFrame$l[1:row]
>>>>       MyFrame$ll <- min(temp1) ## Lowest l
>>>>
>>>>       return(MyFrame)
>>>> }
>>>
>>> You get an error in this function because you didn't define row, so R
>>> assumes you mean the function in the base package, and 1:row doesn't make
>>> sense.
>>>
>>> What you want for the "highest so far" is the cummax (for "cumulative
>>> maximum") function.  See ?cummax.
>>>
>>> Duncan Murdoch
>>>
>>
>> Duncon,
>>   OK, thanks. That makes sense, as long as I want the cummax from the
>> beginning of the data.frame. (Which is exactly what I asked for!)
>>
>>   How would I do this in the more general case if I was looking for
>> the cummax of only the most recent 50 rows in my data.frame? What I'm
>> trying to get down to is that as I fill in my data.frame I need to be
>> able get a max or min or standard deviation of the previous so many
>> rows of data - not the whole column - and I'm just not grasping how to
>> do this. Is seems like I should be able to create a data set that's
>> only a portion of a column while I'm in the function and then take the
>> cummax on that, or use it as an input to a standard deviation, etc.?
>
> What you describe might be called a "running max".  The caTools package has
> a runmax function that probably does what you want.
>
> More generally, you can always write a loop.  They aren't necesssrily fast
> or elegant, but they're pretty general.  For example, to calculate the max
> of the previous 50 observations (or fewer near the start of a vector), you
> could do
>
> x <- ... some vector ...
>
> result <- numeric(length(x))
> for (i in seq_along(x)) {
>  result[i] <- max( x[ max(1, i-49):i ])
> }
>
> Duncan Murdoch
>

Thanks for the pointer. I'll check it out.

Today I've managed to get pretty much all of my Excel spreadsheet
built in R except for some of the charts. It took me a week and a half
in Excel. This is my 3rd full day with R. Charts are next.

I appreciate your help and the help I've gotten from others. Thanks so much.

cheers,
Mark




More information about the R-help mailing list