Maximum value by year

Slavek · December 16, 2019, 11:58am

Dear all,
I'm sure the solution is easier than I think but I have some issues with maximum value by year.
I need to find maximum number of Sequence for each ReferenceNumber and each year.

I simply want to modify my code to see maximum values of Sequence but not only overall but also with 2016 and 2017 column.

response.data <- data.frame(stringsAsFactors=FALSE,
                            ReferenceNumber = c("aaa", "aaa", "aaa", "eee", "eee", "eee", "eee",
                                                "hhh",
                                                "hhh",
                                                "hhh",
                                                "hhh"),
                            Sequence = c(2, 1, 3, 2, 1, 5, 3, 2, 5, 4, 5),
                            S.Year = c(2016, 2016, 2017, 2016, 2016, 2017, 2017, 2017, 2017,
                                       2017,
                                       2017))

response.data %>% 
  group_by(ReferenceNumber) %>% 
  summarise(Max = max(Sequence))

As a result I should see maximum Sequence values like that:

# A tibble: 4 x 2
  ReferenceNumber  Total 2016 2017
1 aaa                 3    2    3
2 eee                 5    2    5
3 hhh                 5         5

>

This might be extremely simple so can you help please?

Hlynur · December 16, 2019, 1:09pm

Hi, here's what you can do.

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(tidyr)

response.data <- data.frame(stringsAsFactors=FALSE,
                            ReferenceNumber = c("aaa", "aaa", "aaa", "eee", "eee", "eee", "eee",
                                                "hhh",
                                                "hhh",
                                                "hhh",
                                                "hhh"),
                            Sequence = c(2, 1, 3, 2, 1, 5, 3, 2, 5, 4, 5),
                            S.Year = c(2016, 2016, 2017, 2016, 2016, 2017, 2017, 2017, 2017,
                                       2017,
                                       2017))

response.data %>% 
  group_by(ReferenceNumber, S.Year) %>% 
  summarise(Max = max(Sequence)) %>% 
  pivot_wider(names_from = "S.Year", values_from = "Max") %>% 
  mutate(Total = pmax(`2016`, `2017`, na.rm = TRUE)) %>% 
  select(ReferenceNumber, Total, everything())
#> # A tibble: 3 x 4
#> # Groups:   ReferenceNumber [3]
#>   ReferenceNumber Total `2016` `2017`
#>   <chr>           <dbl>  <dbl>  <dbl>
#> 1 aaa                 3      2      3
#> 2 eee                 5      2      5
#> 3 hhh                 5     NA      5

^{Created on 2019-12-16 by the reprex package (v0.3.0)}

The changes are that you also group by S.Year and then pivot the data frame to a wider format using tidyr::pivot_wider. Then you can create the Total column by using pmax on 2016 and 2017

response.data %>% 
  group_by(ReferenceNumber, S.Year) %>% 
  summarise(Max = max(Sequence)) %>% 
  pivot_wider(names_from = "S.Year", values_from = "Max") %>% 
  mutate(Total = pmax(`2016`, `2017`, na.rm = TRUE)) %>% 
  select(ReferenceNumber, Total, everything())

Slavek · December 16, 2019, 2:30pm

Excellent! Thank you.
I have a final question. In my real data I have more years (not only 2016 and 2017). How can I make this code for Total more general?

Slavek · December 16, 2019, 2:51pm

Also, (I know it's beyond the original question) is it possible easily pivot the results this way?

# A tibble: 4 x 2
  ReferenceNumber     Year Sequence
1 aaa                 2016    2
2 aaa                 2017    3
3 eee                 2016    2
4 eee                 2017    5
5 hhh                 2017    5

Thank you in advance for your help.

Hlynur · December 16, 2019, 3:04pm

Ahh, in the case of a lot of years, it's perhaps best to do it this way:

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(tidyr)

response.data <- data.frame(stringsAsFactors=FALSE,
                            ReferenceNumber = c("aaa", "aaa", "aaa", "eee", "eee", "eee", "eee",
                                                "hhh",
                                                "hhh",
                                                "hhh",
                                                "hhh"),
                            Sequence = c(2, 1, 3, 2, 1, 5, 3, 2, 5, 4, 5),
                            S.Year = c(2016, 2016, 2017, 2016, 2016, 2017, 2017, 2017, 2017,
                                       2017,
                                       2017))

response.data %>% 
  group_by(ReferenceNumber, S.Year) %>% 
  summarise(Max = max(Sequence)) %>% 
  group_by(ReferenceNumber) %>% 
  mutate(Total = max(Max)) %>% 
  pivot_wider(names_from = "S.Year", values_from = "Max") %>% 
  ungroup()
#> # A tibble: 3 x 4
#>   ReferenceNumber Total `2016` `2017`
#>   <chr>           <dbl>  <dbl>  <dbl>
#> 1 aaa                 3      2      3
#> 2 eee                 5      2      5
#> 3 hhh                 5     NA      5

^{Created on 2019-12-16 by the reprex package (v0.3.0)}

Hlynur · December 16, 2019, 3:08pm

If I understand your question correctly, it would be enough to do.

response.data %>% 
  group_by(ReferenceNumber, S.Year) %>% 
  summarise(Sequence = max(Sequence)) %>% 
  ungroup()
#> # A tibble: 5 x 3
#>   ReferenceNumber S.Year Sequence
#>   <chr>            <dbl>    <dbl>
#> 1 aaa               2016        2
#> 2 aaa               2017        3
#> 3 eee               2016        2
#> 4 eee               2017        5
#> 5 hhh               2017        5

Is that what you're after?

Slavek · December 16, 2019, 3:10pm

Thank you very much. I also found a solution for my final question myself:

response.data %>% 
  select(S.Year, ReferenceNumber, Sequence) %>% 
  group_by(ReferenceNumber, S.Year) %>% 
  summarise_all(list(Maximum= max))

system · December 23, 2019, 3:10pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.