Calculate the mode of a database in R

I would like to create a last column in the `result` database, which determined the `mode` between `df1`, `df2` and `df3` for each line.

    result<-structure(list(n = c(7, 8, 9, 10, 11, 12, 13, 14, 15, 
    16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 
    32, 33, 34, 35), df1 = c(9L, 29L, 28L, 27L, 25L, 26L, 24L, 20L, 
    21L, 22L, 23L, 15L, 12L, 17L, 18L, 19L, 16L, 13L, 14L, 5L, 6L, 
    7L, 8L, 1L, 10L, 11L, 4L, 2L, 3L), df2 = c(3, 29, 28, 27, 26, 
    25, 24, 23, 22, 21, 20, 15, 12, 19, 18, 17, 16, 14, 13, 11, 10, 
    9, 8, 7, 6, 5, 4, 1, 2), df3 = c(1L, 29L, 28L, 27L, 25L, 26L, 
    24L, 20L, 21L, 22L, 23L, 15L, 12L, 17L, 18L, 19L, 16L, 13L, 14L, 
    5L, 6L, 7L, 8L, 9L, 10L, 11L, 4L, 2L, 3L)), row.names = c(NA, 
    -29L), class = "data.frame")

    n df1 df2 df3
1   7   9   3   1
2   8  29  29  29
3   9  28  28  28
4  10  27  27  27
5  11  25  26  25
6  12  26  25  26
7  13  24  24  24
8  14  20  23  20
9  15  21  22  21
10 16  22  21  22
11 17  23  20  23
12 18  15  15  15
13 19  12  12  12
14 20  17  19  17
15 21  18  18  18
16 22  19  17  19
17 23  16  16  16
18 24  13  14  13
19 25  14  13  14
20 26   5  11   5
21 27   6  10   6
22 28   7   9   7
23 29   8   8   8
24 30   1   7   9
25 31  10   6  10
26 32  11   5  11
27 33   4   4   4
28 34   2   1   2
29 35   3   2   3

Does this do what you want?

library(dplyr)

result <- result |> rowwise() |> 
   mutate(Mode = max(c_across(df1:df3)))
result
# A tibble: 29 x 5
# Rowwise: 
       n   df1   df2   df3  Mode
   <dbl> <int> <dbl> <int> <dbl>
 1     7     1     1     1     1
 2     8    29    29    29    29
 3     9    28    28    28    28
 4    10    27    27    27    27
 5    11    25    26    25    26
 6    12    26    25    26    26
 7    13    24    24    24    24
 8    14    20    23    20    23
 9    15    21    22    21    22
10    16    22    21    22    22

Thanks for reply @FJCC ! It's more or less like that, but I found it strange that in your line 8, for example, the mode value is 23 and not 20, as it appears 2 times 20 and only once 23. I would like the value of mode.

Oh, sorry, now I see what you mean.

library(dplyr)
ModeFunc <- function(Vec) {
   tmp <- sort(table(Vec),decreasing = TRUE)
   Nms <- names(tmp)
   as.numeric(Nms[1])
   }
result <- result |> rowwise() |> 
   mutate(Mode = ModeFunc(c_across(df1:df3)))
result
# A tibble: 29 x 5
# Rowwise: 
       n   df1   df2   df3  Mode
   <dbl> <int> <dbl> <int> <dbl>
 1     7     1     1     1     1
 2     8    29    29    29    29
 3     9    28    28    28    28
 4    10    27    27    27    27
 5    11    25    26    25    25
 6    12    26    25    26    26
 7    13    24    24    24    24
 8    14    20    23    20    20
 9    15    21    22    21    21
10    16    22    21    22    22
# ... with 19 more rows
2 Likes

@FJCC 's way of doing it is more elegant doing row_wise operations, but here's another way since I put the work in before seeing their answer:

modes <-
    result %>%
    pivot_longer(
        cols = starts_with("df"),
        names_to = "df"
    ) %>%
    add_count(n, value) %>%
    group_by(n) %>%
    summarise(mode = .data[["value"]][n == max(n)][1])

new_result <-
    result %>%
    left_join(modes)

new_result
1 Like

Thank you very much @FJCC . That's it. I just have one doubt: I even updated the database on the question. Notice that I have in the first line 3 different values. If I use your code and make the mode, I get a value of 1. However, in this case there is no mode, right? since the three values are different.

You have to decide what you want to do in such cases

Thanks for reply @nirgrahamuk! I believe that in these cases insert "-"

If cases with no mode are marked with "-", then the whole column will contain characters instead of numbers. That may or may not be acceptable for you purpose.
Is there any chance a row will contain all NA values? If so, the function should also handle that.

I believe that instead of inserting "-", cases that have no mode can be listed as NA. So for the database above for the line 1 and line 24 it would be NA and not 1. The rest is right.

Output table generated by your code:

> data.frame(result)
    n df1 df2 df3 Mode
1   7   9   3   1    1
2   8  29  29  29   29
3   9  28  28  28   28
4  10  27  27  27   27
5  11  25  26  25   25
6  12  26  25  26   26
7  13  24  24  24   24
8  14  20  23  20   20
9  15  21  22  21   21
10 16  22  21  22   22
11 17  23  20  23   23
12 18  15  15  15   15
13 19  12  12  12   12
14 20  17  19  17   17
15 21  18  18  18   18
16 22  19  17  19   19
17 23  16  16  16   16
18 24  13  14  13   13
19 25  14  13  14   14
20 26   5  11   5    5
21 27   6  10   6    6
22 28   7   9   7    7
23 29   8   8   8    8
24 30   1   7   9    1
25 31  10   6  10   10
26 32  11   5  11   11
27 33   4   4   4    4
28 34   2   1   2    2
29 35   3   2   3    3

Expected output

> data.frame(result)
    n df1 df2 df3 Mode
1   7   9   3   1   NA
2   8  29  29  29   29
3   9  28  28  28   28
4  10  27  27  27   27
5  11  25  26  25   25
6  12  26  25  26   26
7  13  24  24  24   24
8  14  20  23  20   20
9  15  21  22  21   21
10 16  22  21  22   22
11 17  23  20  23   23
12 18  15  15  15   15
13 19  12  12  12   12
14 20  17  19  17   17
15 21  18  18  18   18
16 22  19  17  19   19
17 23  16  16  16   16
18 24  13  14  13   13
19 25  14  13  14   14
20 26   5  11   5    5
21 27   6  10   6    6
22 28   7   9   7    7
23 29   8   8   8    8
24 30   1   7   9   NA
25 31  10   6  10   10
26 32  11   5  11   11
27 33   4   4   4    4
28 34   2   1   2    2
29 35   3   2   3    3
library(dplyr)
ModeFunc <- function(Vec) {
   tmp <- sort(table(Vec),decreasing = TRUE)
   Nms <- names(tmp)
   if(max(tmp) > 1) {
     as.numeric(Nms[1])
   } else NA
 }
result <- result |> rowwise() |> 
   mutate(Mode = ModeFunc(c_across(df1:df3)))
result
# A tibble: 29 x 5
# Rowwise: 
       n   df1   df2   df3  Mode
   <dbl> <int> <dbl> <int> <dbl>
 1     7     1     2     3    NA
 2     8    29    29    29    29
 3     9    28    28    28    28
 4    10    27    27    27    27
 5    11    25    26    25    25
 6    12    26    25    26    26
 7    13    24    24    24    24
 8    14    20    23    20    20
 9    15    21    22    21    21
10    16    22    21    22    22
# ... with 19 more rows
2 Likes

Thank you very much @FJCC!

1 Like

I don't know if it helps, but here is my general purpose mode function
I do feel that mode is neglected in R (and DAX)
This works on numeric, character and logical and returns NA if no non-NA and the highest in sort order if there is a tie.
It suffers from the fact that max(c(TRUE, TRUE) ) is 1 and not TRUE but that is dealt with

MostCommon <- function(x) {
  ux <- unique(x)
  uxnotna <- ux[which(!is.na(ux))]
  if(length(uxnotna) > 0) {
    tab <- tabulate(match(x, uxnotna))
    candidates = uxnotna[tab == max(tab)]
    if (is.logical(x)) {
      any(candidates) # return TRUE if any true. max returns an integer
    } else {
      max(candidates) # return highest (ie max) value
    }
  } else {
    ux   # this returns the NA with the right class. ie that of x
  }
}
1 Like

Hi, @mikecrobp

When I use your function on result dataframe I get an error:

Error in `[.data.frame`(ux, which(!is.na(ux))) :columns not selected

with:

MostCommon(result[,2:4])

the same error pops up.

It is for use on a vector - ie a single column

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.