[R] Getting minimum value of a column according a factor column of a dataframe

Rui Barradas ru|pb@rr@d@@ @end|ng |rom @@po@pt
Fri Aug 26 06:14:02 CEST 2022


Hello,

To return 2 rows for each Code, one for the min and another for the max, 
try the following.

I'm borrowing Bert's by() idea, it makes everything simpler.
There is a hack to have the original Code order kept, since the final 
result res should have two rows for each Code, see what is order()'ed below.


# the output has 2 consecutive rows with
# the same Code, so repeat the unique Codes
i <- order(rep(unique(df1$Code), each = 2))
res <- by(df1, df1$Code, \(x) x[c(which.min(x$Q), which.max(x$Q)), ])
res <- do.call(rbind, res)[order(i), ]

# remake the row names, they're ugly after rbind
row.names(res) <- NULL
res


Hope this helps,

Rui Barradas

Às 19:02 de 25/08/2022, javad bayat escreveu:
> ;Dear all
> First of all I appreciate you for the answers you have sent. I did the
> codes that Rui provided and I got what I wanted.
> "
> res <- lapply(split(df1, df1$Code), \(x) x[which.min(x$Q),])
> res <- do.call(rbind, res)
> i <- order(unique(df1$Code))
> res[order(i), ]
> "
> I think I should explain more about my request. I had a large data frame
> (11059 rows and 16 columns). The Code column represented the stations code,
> totally the number of stations were 128. At each station I had many
> measured variables, like Q and N and O, and these variables were measured
> in different years and days. The days that data were measured were
> different for each station, and due to this reason I had different rows for
> stations. For example, station number one (41009) had 158 rows and station
> number 2 (41011) had 113 rows. Note that the station's codes are not in
> order format (e.g smallest to largest).
> Back to my request, I wanted to extract the minimum value of the Q for each
> station (based on the Code column). The problem was that I wanted to have
> other column values which were measured for this minimum of the Q.
> I hope my explanation was clear enough. As I said before, I used the Rui's
> codes and I got what I wanted. Although, other solutions provided by others
> were all correct.
> 
> Regarding my request, unfortunately I faced another problem. I had to
> extract the maximum of the Q and put it exactly under the minimum of the Q.
> Something like the below one:
> "
> 
> Code
> 
>                Y
> 
>                M
> 
>                 D
> 
>             Q
> 
>              N
> 
>               O
> 
> 41003
> 
> 81
> 
> 1
> 
> 19
> 
> 0.16
> 
> 7.17
> 
> 2.5
> 
> 41003
> 
> 79
> 
> 8
> 
> 17
> 
> 10.21
> 
> 5.5
> 
> 7.2
> 
> 41009
> 
> 79
> 
> 2
> 
> 21
> 
> 0.218
> 
> 5.56
> 
> 4.04
> 41009 79 10 20 12.24 5.3 7.1
> .
> .
> .
> .
> "
> I extract both min and max according to the codes, and I have 2 dataframes,
> one for the minimum values and another for the max values. Both dataframe
> have a Code column which is exactly similar.
> Can I extract both min and max simultaneously or I have to combine two
> dataframes?
> I used the rbind and merge function but they did not give the desired
> results.
>> df3 = merge (df1, df2, by = "Code")
> The result of this code adds a second dataframe as columns to the first
> one. I want the first row of the second dataframe put below the first row
> of the first dataframe and so on. I used a function to do this but it seems
> it does not work correctly.
> 
>> fun2 = function(x,y){
>                  i = 1
>                  for(i in x) {
>                        if (x[i,1] == y[i,1]){
>                            rbind(x[i,],y[i,])
>                            i = i+1
>                  }
>                  }
>                  }
>> fun2(df1, df2)
> 
> Sincerely
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> On Thu, Aug 25, 2022 at 9:08 PM <avi.e.gross using gmail.com> wrote:
> 
>> Yes, Timothy, the request was not seen by all of us as the same.
>>
>> Indeed if the request was to show a subset of the original data consisting
>> of only the rows that were the minimum for each Code and also showed ties,
>> then the solution is a tad more complex. I would then do something along
>> the
>> lines of what others showed such as generating another column showing the
>> minimum for each row and then showing only rows that matched their value in
>> two columns or whatever was needed.
>>
>> As noted, keeping the output in a specific order was not initially
>> requested.
>>
>> Keeping the data in some order is a common enough request but in this
>> situation, I suspect the order many might want would be the one showing the
>> minimums in order, not the codes in the original order.
>>
>> -----Original Message-----
>> From: Ebert,Timothy Aaron <tebert using ufl.edu>
>> Sent: Thursday, August 25, 2022 11:59 AM
>> To: avi.e.gross using gmail.com
>> Cc: R-help using r-project.org
>> Subject: RE: [R] Getting minimum value of a column according a factor
>> column
>> of a dataframe
>>
>> My assumption (maybe wrong) was that we needed to keep the other variables.
>> I want to find the values of Y, M, D, N, and O for the minimum value of Q
>> within each unique value of Code, keeping the data in the original order.
>> All one need to do is filter Q in the original dataframe by your answer for
>> minQ.
>>
>> Keeping the data in the original order seems unnecessary, but that is what
>> was asked in a later post.
>>
>> ______________________________________________
>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
> 
>



More information about the R-help mailing list