[R] Getting minimum value of a column according a factor column of a dataframe

Bert Gunter bgunter@4567 @end|ng |rom gm@||@com
Thu Aug 25 19:26:49 CEST 2022


A slightly slicker solution making use of the handy by() function to
avoid the lapply(split...) construction.

> do.call(rbind,by(df1, df1$Code, \(x)x[which.min(x$Q),]))

       Code  Y  M  D     Q    N    O
41003 41003 81  1 19 0.160 7.17 2.50
41005 41005 79  8 17 0.210 5.50 7.20
41009 41009 79  2 21 0.218 5.56 4.04
41017 41017 79 10 20 0.240 5.30 7.10

This of course ignores the issue of tied minima that Tim Ebert brought
up. That would require a bit more finagling in the anonymous function
code instead of which.min() .

Cheers,
Bert

On Thu, Aug 25, 2022 at 12:22 AM Rui Barradas <ruipbarradas using sapo.pt> wrote:
>
> Hello,
>
> OK, what about
>
>
> res <- lapply(split(df1, df1$Code), \(x) x[which.min(x$Q),])
> do.call(rbind, res)
> #         Code  Y  M  D     Q    N    O
> #  41003 41003 81  1 19 0.160 7.17 2.50
> #  41005 41005 79  8 17 0.210 5.50 7.20
> #  41009 41009 79  2 21 0.218 5.56 4.04
> #  41017 41017 79 10 20 0.240 5.30 7.10
>
>
> A dplyr solution.
>
>
>
> suppressPackageStartupMessages(library(dplyr))
>
> df1 %>%
>    group_by(Code) %>%
>    slice_min(Q) %>%
>    slice_head(n = 1)
> #  # A tibble: 4 × 7
> #  # Groups:   Code [4]
> #    Code      Y     M     D     Q     N     O
> #    <fct> <int> <int> <int> <dbl> <dbl> <dbl>
> #  1 41003    81     1    19 0.16   7.17  2.5
> #  2 41005    79     8    17 0.21   5.5   7.2
> #  3 41009    79     2    21 0.218  5.56  4.04
> #  4 41017    79    10    20 0.24   5.3   7.1
>
>
>
> Hope this helps,
>
> Rui Barradas
>
>
> Às 05:56 de 25/08/2022, javad bayat escreveu:
> > Dear all,
> > Many thanks for your suggested methods and codes, but unfortunately they
> > did not give the desired results.
> > All the codes you have provided are correct but they did not represent the
> > whole row which is related to the minimum of "Q".
> > The code must result in 4 rows, with the minimum value of "Q" and other
> > column values, as below:
> >
> >         Code
> >
> >                Y
> >
> >                M
> >
> >                 D
> >
> >             Q
> >
> >              N
> >
> >               O
> >
> > 41003
> >
> > 81
> >
> > 1
> >
> > 19
> >
> > 0.16
> >
> > 7.17
> >
> > 2.5
> >
> > 41005
> >
> > 79
> >
> > 8
> >
> > 17
> >
> > 0.21
> >
> > 5.5
> >
> > 7.2
> >
> > 41009
> >
> > 79
> >
> > 2
> >
> > 21
> >
> > 0.218
> >
> > 5.56
> >
> > 4.04
> > 41017 79 10 20 0.24 5.3 7.1
> >
> >
> >
> >
> >
> >
> > Sincerely
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > 41017 79 10 20 0.24 5.3 7.1
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > On Wed, Aug 24, 2022 at 9:24 PM Rui Barradas <ruipbarradas using sapo.pt> wrote:
> >
> >> Hello,
> >>
> >> Here are two options, the 1st outputs a vector, the 2nd a data.frame.
> >>
> >>
> >> x<-'41003 81 1 19 0.16 7.17 2.5
> >> 41003 77 9 22 0.197 6.8 2.2
> >> 41003 79 7 28 0.21 4.7 6.2
> >> 41005 79 8 17 0.21 5.5 7.2
> >> 41005 80 10 30 0.21 6.84 2.6
> >> 41005 80 12 20 0.21 6.84 2.4
> >> 41005 79 6 14 0.217 5.61 3.55
> >> 41009 79 2 21 0.218 5.56 4.04
> >> 41009 79 5 27 0.218 6.4 3.12
> >> 41009 80 11 29 0.22 6.84 2.8
> >> 41009 78 5 28 0.232 6 3.2
> >> 41009 81 8 20 0.233 6.39 1.6
> >> 41009 79 9 30 0.24 5.6 7.5
> >> 41017 79 10 20 0.24 5.3 7.1
> >> 41017 80 7 30 0.24 6.73 2.6'
> >> df1 <- read.table(textConnection(x))
> >> names(df1) <- scan(what = character(),
> >>                      text = 'Code Y M D Q N O')
> >> df1$Code <- factor(df1$Code)
> >>
> >> # 1st option
> >> with(df1, tapply(Q, Code, min))
> >> #  41003 41005 41009 41017
> >> #  0.160 0.210 0.218 0.240
> >>
> >> # 2nd option
> >> aggregate(Q ~ Code, df1, min)
> >> #     Code     Q
> >> #  1 41003 0.160
> >> #  2 41005 0.210
> >> #  3 41009 0.218
> >> #  4 41017 0.240
> >>
> >>
> >> Hope this helps,
> >>
> >> Rui Barradas
> >>
> >> Às 08:44 de 24/08/2022, javad bayat escreveu:
> >>> Dear all;
> >>> I am trying to get the minimum value of a column based on a factor column
> >>> of the same data frame. My data frame is like the below:
> >>>          Code               Y               M                D
> >>>    Q
> >>>        N              O
> >>> 41003 81 1 19 0.16 7.17 2.5
> >>> 41003 77 9 22 0.197 6.8 2.2
> >>> 41003 79 7 28 0.21 4.7 6.2
> >>> 41005 79 8 17 0.21 5.5 7.2
> >>> 41005 80 10 30 0.21 6.84 2.6
> >>> 41005 80 12 20 0.21 6.84 2.4
> >>> 41005 79 6 14 0.217 5.61 3.55
> >>> 41009 79 2 21 0.218 5.56 4.04
> >>> 41009 79 5 27 0.218 6.4 3.12
> >>> 41009 80 11 29 0.22 6.84 2.8
> >>> 41009 78 5 28 0.232 6 3.2
> >>> 41009 81 8 20 0.233 6.39 1.6
> >>> 41009 79 9 30 0.24 5.6 7.5
> >>> 41017 79 10 20 0.24 5.3 7.1
> >>> 41017 80 7 30 0.24 6.73 2.6
> >>>
> >>> I want to get the minimum value of the "Q" column with the whole row
> >>> values, according to the "Code"  column  which is a factor. Overall it
> >> will
> >>> give me 4 rows, with the value of "Q". Below is a code that I used but it
> >>> did not give me what I wanted.
> >>>
> >>>> x[which(x$Q == min(x$Q)),]
> >>>
> >>> Sincerely
> >>>
> >>>
> >>>
> >>
> >
> >
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list