Analyze one variable based on two others

Ahoy and welcome. First off, note our homework policy, FAQ: Homework Policy. It has tips on how best to work with this forum to get help with homework. And what might result in hiding your post (for example, please never post homework questions verbatim.)

We'd also strongly encourage you to quickly get comfortable posing these kinds of questions with a reproducible example (more on that FAQ: Tips for writing R-related questions)

Using the tidyverse, the following reprex is one approach.


library(dplyr)

# setting up the data,
# each country/year combo has a number of values. 
df <- data.frame(
  year = rep(c(1990, 1990, 1991, 1991), 2),
  country = rep(c("a", "b"), 4)
) %>% 
  mutate(
    value = rnorm(n())
  ) %>% 
  arrange(country, year)
df
#>   year country       value
#> 1 1990       a -0.38071845
#> 2 1990       a -1.47383021
#> 3 1991       a  0.18011637
#> 4 1991       a  1.53552271
#> 5 1990       b  0.09491687
#> 6 1990       b -0.75376016
#> 7 1991       b -0.49575320
#> 8 1991       b  1.13004382

# new data frame
# for each coutry-year, get the highest value
df_max <- df %>% 
  group_by(country, year) %>% 
  summarise(
    max_val = max(value)
  )
df_max
#> # A tibble: 4 x 3
#> # Groups:   country [2]
#>   country  year max_val
#>   <fct>   <dbl>   <dbl>
#> 1 a        1990 -0.381 
#> 2 a        1991  1.54  
#> 3 b        1990  0.0949
#> 4 b        1991  1.13

Created on 2020-03-25 by the reprex package (v0.3.0)

For a great introduction to R for Data Science generally, I'd really encourage you to check out the R4DS book. It has a section on data transformation that goes over how I set this up.

1 Like