[R] max & min values within dataframe

Dennis Murphy djmuser at gmail.com
Mon Nov 14 21:55:33 CET 2011


Groupwise data summarization is a very common task, and it is worth
learning the various ways to do it in R. Josh showed you one way to
use aggregate() from the base package and Michael showed you one way
of using the plyr package to do the same; another way would be

ddply(df, .(Patient, Region), summarise, max = max(Score), min = min(Score))

to save on writing an explicit function. Similarly, if you have a
version of R >= 2.11.0, the aggregate() function now has a nice
formula interface, so Josh's code could also be written as

aggregate(Score ~ Patient + Region, data = df, FUN = range)

with a subsequent renaming of the variables as shown.

Other packages that could perform this task with ease include the doBy
package, the data.table package, the remix package, the Hmisc package
and, if you are comfortable with SQL, the sqldf package. For relative
novices, the doBy package is a very nice place to start because it
comes with a well written vignette and the function names correspond
well with the tasks they perform (e.g., summaryBy(), transformBy()).
The plyr and data.table packages are more general and more powerful in
terms of the types of tasks to which each is suited. Unlike
aggregate() and doBy:::summaryBy(), these packages can process
multivariable functions. As noted above, if you have an SQL
background, sqldf operates on R data objects as though they were SQL
tables, which is advantageous in complex data extraction tasks.
Package remix is useful if you want to organize results into a tabular
form that is reminiscent of SAS.

HTH,
Dennis

On Mon, Nov 14, 2011 at 8:10 AM, B Laura <gm.spam2011 at gmail.com> wrote:
> dear R-team
>
> I need to find the min, max values for each patient from dataset and keep
> the output of it as a dataframe with the following columns
>  - Patient nr
>  - Region (remains same per patient)
>  - Min score
>  - Max score
>
>
>    Patient Region Score Time
> 1        1      X    19   28
> 2        1      X    20  126
> 3        1      X    22  100
> 4        1      X    25  191
> 5        2      Y    12    1
> 6        2      Y    12    2
> 7        2      Y    25    4
> 8        2      Y    26    7
> 9        3      X     6    1
> 10       3      X     6    4
> 11       3      X    21   31
> 12       3      X    22   68
> 13       3      X    23   31
> 14       3      X    24   38
> 15       3      X    21   15
> 16       3      X    22   24
> 17       3      X    23   15
> 18       3      X    24  243
> 19       3      X    25   77
> 20       4      Y     6    5
> 21       4      Y    22   28
> 22       4      Y    23   75
> 23       4      Y    24   19
> 24       5      Y    23    3
> 25       5      Y    24    1
> 26       5      Y    23   33
> 27       5      Y    24   13
> 28       5      Y    25   42
> 29       5      Y    26   21
> 30       5      Y    27    4
> 31       6      Y    24    4
> 32       6      Y    32    8
>
> So far I could find the min and max values for each patient, but the output
> of it is not (yet) what I need.
>
>> Patient.nr = unique(Patient)
>> aggregate(Score, list(Patient), max)
>  Group.1  x
> 1       1 25
> 2       2 26
> 3       3 25
> 4       4 24
> 5       5 27
> 6       6 32
>
>> aggregate(Score, list(Patient), min)
>  Group.1  x
> 1       1 19
> 2       2 12
> 3       3  6
> 4       4  6
> 5       5 23
> 6       6 24
> I would like to do same but writing this new information (min, max values)
> in a dataframe with following columns
>  - Patient nr
> - Region (remains same per patient)
> - Min score
> - Max score
>
> Can anybody help me with this?
>
> Thanks
> Laura
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list