Data manipulation, generating new columns with data from an existing dataframe

I am working on some data manipulation practice and have hit a stumble with the data that I am trying to emulate. From the Gapminder data I am ultimately wanting to have my data represent the Americas, with new columns for gdp_2002 and gdp_2007. I know how to do calculation to receive a column similar to gdp_change, but am stuck:

What I want my manipulation to look like (also the first dataframe referenced [here] which I am trying to emulate. (https://www.onceupondata.com/2019/01/25/ggplot2-divergent-bars/)

## # A tibble: 25 x 4
##    country            year_1997 year_2007 gdp_change
##    <fct>                  <dbl>     <dbl>      <dbl>
##  1 Argentina             10967.    12779.      16.5 
##  2 Bolivia                3326.     3822.      14.9 
##  3 Brazil                 7958.     9066.      13.9 
##  4 Canada                28955.    36319.      25.4 
##  5 Chile                 10118.    13172.      30.2 
##  6 Colombia               6117.     7007.      14.5 
##  7 Costa Rica             6677.     9645.      44.5 
##  8 Cuba                   5432.     8948.      64.7 
##  9 Dominican Republic     3614.     6025.      66.7 
## 10 Ecuador                7429.     6873.      -7.49
## # ... with 15 more rows

And what I have so far is below, though I'm not sure these are correct steps. Essentially, I want to manipulate the data so I have a column that has all 2002 gdp values and all 2007 gdp values. Any help or suggestions on this kind of manipulation would be appreciated! Thanks!

gapminder %>% 
  filter(continent == "Americas") %>% 
  filter(year == "2002")
gapminder["gdp_2002"] <- NA
gapminder$gdp_2002 <- gapminder$gdpPercap

Hi @RichCL2,
I do hope that this is NOT homework!

library(gapminder)
suppressMessages(library(tidyverse))

# Year-by-year
gapminder %>%
  filter(continent == "Americas") %>%
  filter(year == "2002") -> y2002.df
head(y2002.df)
#> # A tibble: 6 x 6
#>   country   continent  year lifeExp       pop gdpPercap
#>   <fct>     <fct>     <int>   <dbl>     <int>     <dbl>
#> 1 Argentina Americas   2002    74.3  38331121     8798.
#> 2 Bolivia   Americas   2002    63.9   8445134     3413.
#> 3 Brazil    Americas   2002    71.0 179914212     8131.
#> 4 Canada    Americas   2002    79.8  31902268    33329.
#> 5 Chile     Americas   2002    77.9  15497046    10779.
#> 6 Colombia  Americas   2002    71.7  41008227     5755.

gapminder %>%
  filter(continent == "Americas") %>%
  filter(year == "2007") -> y2007.df
head(y2007.df)
#> # A tibble: 6 x 6
#>   country   continent  year lifeExp       pop gdpPercap
#>   <fct>     <fct>     <int>   <dbl>     <int>     <dbl>
#> 1 Argentina Americas   2007    75.3  40301927    12779.
#> 2 Bolivia   Americas   2007    65.6   9119152     3822.
#> 3 Brazil    Americas   2007    72.4 190010647     9066.
#> 4 Canada    Americas   2007    80.7  33390141    36319.
#> 5 Chile     Americas   2007    78.6  16284741    13172.
#> 6 Colombia  Americas   2007    72.9  44227550     7007.

left_join(y2002.df, y2007.df, by="country") %>% 
  select(c(1,2,6,11)) %>% 
  head(.)
#> # A tibble: 6 x 4
#>   country   continent.x gdpPercap.x gdpPercap.y
#>   <fct>     <fct>             <dbl>       <dbl>
#> 1 Argentina Americas          8798.      12779.
#> 2 Bolivia   Americas          3413.       3822.
#> 3 Brazil    Americas          8131.       9066.
#> 4 Canada    Americas         33329.      36319.
#> 5 Chile     Americas         10779.      13172.
#> 6 Colombia  Americas          5755.       7007.

# An all-in-one solution
gapminder %>%
  filter(continent == "Americas") %>%
  filter((year == "2002") | (year == "2007")) %>% 
  select(c(1,3,6)) %>% 
  pivot_wider(., names_from=year, values_from=gdpPercap, names_prefix="Year_") %>% 
  mutate(gdpchange = ((Year_2007 - Year_2002)/Year_2007)*100)
#> # A tibble: 25 x 4
#>    country            Year_2002 Year_2007 gdpchange
#>    <fct>                  <dbl>     <dbl>     <dbl>
#>  1 Argentina              8798.    12779.     31.2 
#>  2 Bolivia                3413.     3822.     10.7 
#>  3 Brazil                 8131.     9066.     10.3 
#>  4 Canada                33329.    36319.      8.23
#>  5 Chile                 10779.    13172.     18.2 
#>  6 Colombia               5755.     7007.     17.9 
#>  7 Costa Rica             7723.     9645.     19.9 
#>  8 Cuba                   6341.     8948.     29.1 
#>  9 Dominican Republic     4564.     6025.     24.3 
#> 10 Ecuador                5773.     6873.     16.0 
#> # ... with 15 more rows

Created on 2020-06-02 by the reprex package (v0.3.0)

HTH

Thank you for the help! No homework, just some practice manipulation exercises so my future homework portfolio looks even better!

Cheers

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.