Aggregate rows in the dataframe

andreleite · June 27, 2021, 12:06pm

Hi,

I have the yearly values of exports divided by partners/countries, but I need to aggregate them in regional blocs
For instance, create one group called "Pacific Alliance" composed by Mexico, Colombia, Peru and Chile and another called "Andean Community" (Colombia, Peru, Ecuador and Bolivia). As you can see, in some cases, the same country belongs to more than one bloc. Do you know how to deal with that?
In the end, instead of a column with counties, I should have a column indicating the trade value of Pacific Alliance in 2010 and in 2011 and the same goes for other blocs that I need to create

My reprex is:

data.frame(
        Year = c(2011L,2011L,2011L,2011L,2011L,2011L,
                 2011L,2011L,2011L,2011L,2011L,2011L,2011L,2011L,2011L,
                 2011L,2011L,2011L,2010L,2010L,2010L,2010L,2010L,2010L,
                 2010L,2010L,2010L,2010L,2010L,2010L,2010L,2010L,2010L,
                 2010L,2010L,2010L),
  TradeValue = c(663536,305249525,1370308420,2205006468,
                 40354718,18525637,1908591669,1395268,293072616,704938268,
                 11219301,12729123,1396866761,776401,66465942,2833496,
                 22148147,1750409643,674346,130382390,1040262882,906924506,
                 34583261,2140942,1824535234,1272176,173326030,638214736,
                 7639368,10267257,1131840170,816590,4195982,2104475,14299677,
                 1422877052),
     Partner = as.factor(c("Antigua and Barbuda",
                           "Argentina","Brazil","Chile","Cuba","Dominica",
                           "Ecuador","Grenada","Honduras","Mexico",
                           "Nicaragua","Paraguay","Peru","Saint Kitts and Nevis",
                           "Saint Lucia","Saint Vincent and the Grenadines",
                           "Uruguay","Venezuela","Antigua and Barbuda","Argentina",
                           "Brazil","Chile","Cuba","Dominica","Ecuador",
                           "Grenada","Honduras","Mexico","Nicaragua","Paraguay",
                           "Peru","Saint Kitts and Nevis","Saint Lucia",
                           "Saint Vincent and the Grenadines","Uruguay",
                           "Venezuela"))
)

I appreciate your help in advance!

lucasgraybuck · June 27, 2021, 4:54pm

I think we can do this with a combination of dplyr and purrr:

library(dplyr)
library(purrr)

# First, make a list of the blocs, which can include overlaps: 
blocs <- list(
  "Pacific Alliance" = c("Mexico", "Colombia", "Peru", "Chile"),
  "Andean Community" = c("Colombia", "Peru", "Ecuador", "Bolivia")
)

# Then, we can make a function that gives us a data.frame 
# with the sum of TradeValue for each year within the bloc:
bloc_summary <- function(bloc) {
  df %>%
    filter(Partner %in% bloc) %>%
    group_by(Year) %>%
    summarise(TradeValue = sum(TradeValue))
}

# Finally, let's use purrr::map_dfr to assemble the final data.frame:
bloc_df <- map_dfr(
  blocs,
  bloc_summary,
  .id = "Bloc")

# The .id parameter will give us a new column with the name of each bloc
# taken from the names in the list.

Here's what we now have for bloc_df:

> bloc_df
# A tibble: 4 x 3
  Bloc              Year TradeValue
  <chr>            <int>      <dbl>
1 Pacific Alliance  2010 2676979412
2 Pacific Alliance  2011 4306811497
3 Andean Community  2010 2956375404
4 Andean Community  2011 3305458430

andreleite · June 27, 2021, 10:31pm

Thank you!!
It is working perfectly now

andresrcs · June 28, 2021, 4:59pm

Another option using pivoting instead of iteration

library(tidyverse)

bloc_df <- data.frame(
    Year = c(2011L,2011L,2011L,2011L,2011L,2011L,
             2011L,2011L,2011L,2011L,2011L,2011L,2011L,2011L,2011L,
             2011L,2011L,2011L,2010L,2010L,2010L,2010L,2010L,2010L,
             2010L,2010L,2010L,2010L,2010L,2010L,2010L,2010L,2010L,
             2010L,2010L,2010L),
    TradeValue = c(663536,305249525,1370308420,2205006468,
                   40354718,18525637,1908591669,1395268,293072616,704938268,
                   11219301,12729123,1396866761,776401,66465942,2833496,
                   22148147,1750409643,674346,130382390,1040262882,906924506,
                   34583261,2140942,1824535234,1272176,173326030,638214736,
                   7639368,10267257,1131840170,816590,4195982,2104475,14299677,
                   1422877052),
    Partner = as.factor(c("Antigua and Barbuda",
                          "Argentina","Brazil","Chile","Cuba","Dominica",
                          "Ecuador","Grenada","Honduras","Mexico",
                          "Nicaragua","Paraguay","Peru","Saint Kitts and Nevis",
                          "Saint Lucia","Saint Vincent and the Grenadines",
                          "Uruguay","Venezuela","Antigua and Barbuda","Argentina",
                          "Brazil","Chile","Cuba","Dominica","Ecuador",
                          "Grenada","Honduras","Mexico","Nicaragua","Paraguay",
                          "Peru","Saint Kitts and Nevis","Saint Lucia",
                          "Saint Vincent and the Grenadines","Uruguay",
                          "Venezuela"))
)

bloc_df %>% 
    mutate(`Pacific Alliance` = if_else(Partner %in% c("Mexico", "Colombia", "Peru", "Chile"), TRUE, FALSE),
           `Andean Community` = if_else(Partner %in% c("Colombia", "Peru", "Ecuador", "Bolivia"), TRUE, FALSE)) %>% 
    pivot_longer(where(is.logical), names_to = "Bloc", values_to = "flag") %>% 
    filter(flag) %>% 
    group_by(Bloc, Year) %>% 
    summarise(TradeValue = sum(TradeValue))
#> `summarise()` has grouped output by 'Bloc'. You can override using the `.groups` argument.
#> # A tibble: 4 x 3
#> # Groups:   Bloc [2]
#>   Bloc              Year TradeValue
#>   <chr>            <int>      <dbl>
#> 1 Andean Community  2010 2956375404
#> 2 Andean Community  2011 3305458430
#> 3 Pacific Alliance  2010 2676979412
#> 4 Pacific Alliance  2011 4306811497

^{Created on 2021-06-28 by the reprex package (v2.0.0)}

system · July 5, 2021, 4:59pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.