Data manipulation, generating new columns with data from an existing dataframe

I am working on some data manipulation practice and have hit a stumble with the data that I am trying to emulate. From the Gapminder data I am ultimately wanting to have my data represent the Americas, with new columns for gdp_2002 and gdp_2007. I know how to do calculation to receive a column similar to gdp_change, but am stuck:

What I want my manipulation to look like (also the first dataframe referenced [here] which I am trying to emulate. (https://www.onceupondata.com/2019/01/25/ggplot2-divergent-bars/)

## # A tibble: 25 x 4
##    country            year_1997 year_2007 gdp_change
##    <fct>                  <dbl>     <dbl>      <dbl>
##  1 Argentina             10967.    12779.      16.5 
##  2 Bolivia                3326.     3822.      14.9 
##  3 Brazil                 7958.     9066.      13.9 
##  4 Canada                28955.    36319.      25.4 
##  5 Chile                 10118.    13172.      30.2 
##  6 Colombia               6117.     7007.      14.5 
##  7 Costa Rica             6677.     9645.      44.5 
##  8 Cuba                   5432.     8948.      64.7 
##  9 Dominican Republic     3614.     6025.      66.7 
## 10 Ecuador                7429.     6873.      -7.49
## # ... with 15 more rows

And what I have so far is below, though I'm not sure these are correct steps. Essentially, I want to manipulate the data so I have a column that has all 2002 gdp values and all 2007 gdp values. Any help or suggestions on this kind of manipulation would be appreciated! Thanks!

gapminder %>% 
  filter(continent == "Americas") %>% 
  filter(year == "2002")
gapminder["gdp_2002"] <- NA
gapminder$gdp_2002 <- gapminder$gdpPercap

Hi @RichCL2,
I do hope that this is NOT homework!

library(gapminder)
suppressMessages(library(tidyverse))

# Year-by-year
gapminder %>%
  filter(continent == "Americas") %>%
  filter(year == "2002") -> y2002.df
head(y2002.df)
#> # A tibble: 6 x 6
#>   country   continent  year lifeExp       pop gdpPercap
#>   <fct>     <fct>     <int>   <dbl>     <int>     <dbl>
#> 1 Argentina Americas   2002    74.3  38331121     8798.
#> 2 Bolivia   Americas   2002    63.9   8445134     3413.
#> 3 Brazil    Americas   2002    71.0 179914212     8131.
#> 4 Canada    Americas   2002    79.8  31902268    33329.
#> 5 Chile     Americas   2002    77.9  15497046    10779.
#> 6 Colombia  Americas   2002    71.7  41008227     5755.

gapminder %>%
  filter(continent == "Americas") %>%
  filter(year == "2007") -> y2007.df
head(y2007.df)
#> # A tibble: 6 x 6
#>   country   continent  year lifeExp       pop gdpPercap
#>   <fct>     <fct>     <int>   <dbl>     <int>     <dbl>
#> 1 Argentina Americas   2007    75.3  40301927    12779.
#> 2 Bolivia   Americas   2007    65.6   9119152     3822.
#> 3 Brazil    Americas   2007    72.4 190010647     9066.
#> 4 Canada    Americas   2007    80.7  33390141    36319.
#> 5 Chile     Americas   2007    78.6  16284741    13172.
#> 6 Colombia  Americas   2007    72.9  44227550     7007.

left_join(y2002.df, y2007.df, by="country") %>% 
  select(c(1,2,6,11)) %>% 
  head(.)
#> # A tibble: 6 x 4
#>   country   continent.x gdpPercap.x gdpPercap.y
#>   <fct>     <fct>             <dbl>       <dbl>
#> 1 Argentina Americas          8798.      12779.
#> 2 Bolivia   Americas          3413.       3822.
#> 3 Brazil    Americas          8131.       9066.
#> 4 Canada    Americas         33329.      36319.
#> 5 Chile     Americas         10779.      13172.
#> 6 Colombia  Americas          5755.       7007.

# An all-in-one solution
gapminder %>%
  filter(continent == "Americas") %>%
  filter((year == "2002") | (year == "2007")) %>% 
  select(c(1,3,6)) %>% 
  pivot_wider(., names_from=year, values_from=gdpPercap, names_prefix="Year_") %>% 
  mutate(gdpchange = ((Year_2007 - Year_2002)/Year_2007)*100)
#> # A tibble: 25 x 4
#>    country            Year_2002 Year_2007 gdpchange
#>    <fct>                  <dbl>     <dbl>     <dbl>
#>  1 Argentina              8798.    12779.     31.2 
#>  2 Bolivia                3413.     3822.     10.7 
#>  3 Brazil                 8131.     9066.     10.3 
#>  4 Canada                33329.    36319.      8.23
#>  5 Chile                 10779.    13172.     18.2 
#>  6 Colombia               5755.     7007.     17.9 
#>  7 Costa Rica             7723.     9645.     19.9 
#>  8 Cuba                   6341.     8948.     29.1 
#>  9 Dominican Republic     4564.     6025.     24.3 
#> 10 Ecuador                5773.     6873.     16.0 
#> # ... with 15 more rows

Created on 2020-06-02 by the reprex package (v0.3.0)

HTH

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

Thank you for the help! No homework, just some practice manipulation exercises so my future homework portfolio looks even better!

Cheers

1 Like