[R] fast way to find most common value across columns dataframe

Jim Lemon drj|m|emon @end|ng |rom gm@||@com
Sat Oct 31 10:28:35 CET 2020


Hi Luigi,
If I understand your request:

library(prettyR)
apply(as.matrix(df),1,Mode)
[1] "C"       "B"       "D"       ">1 mode" ">1 mode" ">1 mode" "D"
[8] "C"       "B"       ">1 mode"

Jim

On Sat, Oct 31, 2020 at 7:56 PM Luigi Marongiu <marongiu.luigi using gmail.com>
wrote:

> Hello,
> I have a large dataframe (1 000 000 rows, 1000 columns) where the
> columns contain a character. I would like to determine the most common
> character for each row.
> In the example below, I can parse one row at the time and find the
> most common character (apart for ties...). But I think this will be
> very slow and memory consuming.
> Is there a way to run it more efficiently?
> Thank you
>
> ```
> V = c("A", "B", "C", "D")
> df = data.frame(n = 1:10,
>        col_01 = sample(V, 10, replace = TRUE, prob = NULL),
>        col_02 = sample(V, 10, replace = TRUE, prob = NULL),
>        col_03 = sample(V, 10, replace = TRUE, prob = NULL),
>        col_04 = sample(V, 10, replace = TRUE, prob = NULL),
>        col_05 = sample(V, 10, replace = TRUE, prob = NULL),
>        stringsAsFactors = FALSE)
>
> q = vector()
> for(i in 1:nrow(df)) {
>   x = as.vector(t(df[i,2:ncol(df)]))
>   q[i] =    names(which.max(table(x)))
> }
> df$most = q
> ```
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list