generate a new variable with maximum value by group

jlee12.0 · April 1, 2021, 2:53pm

Hi there - I am new to R. It might be very simple but I need help.

Group and value are given, and I want to add one more variable called "max_value" which indicates the maximum value of each group.

I tried...
data[, max_value := max(value), by = group]
but the result does not seem to be correct.

Group value max_value
A 3 3
A 1 3
A 2 3
B 3 6
B 6 6
B 2 6
C 4 10
C 1 10
C 10 10

Thank you in advance.

cmeuli07 · April 1, 2021, 3:36pm

If you want the same number of rows returned as entered, do this:

require(dplyr)
data <- data %>%
   group_by(group) %>%
   mutate(max_value = max(value)

If you want to return the same number of rows as groups (at most number of rows entered, but likely less, do this:

require(dplyr)
data <- data %>%
   group_by(group) %>%
   summarize(max_value = max(value)

jlee12.0 · April 1, 2021, 4:26pm

Thanks for the quick reply.
One additional question. In case that I have multiple variables that I want to find the maximum values... will the code below work? Or do I need to run it separately?

require(dplyr)
data <- data %>%
group_by(group) %>%
mutate(max_value=max(value), max_weight=max(weight), max_height=max(height))

mara · April 1, 2021, 4:43pm

This would be a good place to use across().

In this case it would look something like this:

data %>%
  group_by(group) %>%
    mutate(across(c(value, weight, height), max, .names = "max_{.col}"))

cmeuli07 · April 1, 2021, 5:17pm

jlee12.0 - Yes that should work. One thing I always recommend is just trying things yourself. It's good to ask questions when you get stuck, but sometimes I feel like beginners are scared to hit the "run" button because they don't know if their code is right. The best way to find out if your code works is to run it! If it breaks, no harm no foul! I run scripts that error out all the time, many, many times in a row even. That's how I learn!

Also, Mara's code rocks! It does the same thing that your code does, accept it's less typing. By calling across she's applying the max function to the three columns listed in c().

mara · April 1, 2021, 6:32pm

This is great advice!

across() is super powerful, and I could've done the same thing in a way that allows for sort of "dynamic" column names, since you might want multiple functions run across several columns (I'll use summarize() here, since it makes more sense). Note, I'm replacing the hard-coded max prefix in the .names argument with {.fn}. {.fn} gets its names from the named list of functions I'll run over the columns we've selected.

data %>%
  group_by(group) %>%
    summarize(across(c(value, weight, height), list(max = max, min = min), .names = "{.fn}_{.col}"))

There are also lots of ways to select which columns you're running the functions on, since it uses tidyselect syntax.

The colwise dplyr vignette is also really helpful:

system · April 22, 2021, 6:33pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.