Cleaning world bank data in R (pivot_wider?)

Hey there!

I downloaded a CSV file from the world bank database with many variables.

Previously I used the pivot_longer function in order to create a column with years as variables and their respective values as another column.

However, now I need to create a new variable(column) for each of the fields I downloaded for my CSV. is there a specific function or method for solving this issue?

I think I can do it with pivot wider but I don't know how to use it specifically to create as many fields as needed.

Can you provide a reproducible example rather than a screenshot?

Can you provide an example of what your intended output is?

perfect here's the main idea:

world_data <- read.csv('C:/Users/lol/Desktop/Health nutrition.csv')
#> Warning in file(file, "rt"): cannot open file 'C:/Users/lol/Desktop/Health
#> nutrition.csv': No such file or directory
#> Error in file(file, "rt"): cannot open the connection

world_data <- pivot_longer(data = world_data, names_to = 'Year', values_to = 'value',
                           "X2006..YR2006.":'X2020..YR2020.')
#> Error in pivot_longer(data = world_data, names_to = "Year", values_to = "value", : could not find function "pivot_longer"
head(world_data)
#> Error in head(world_data): object 'world_data' not found
library(ggplot2)
#> Warning: package 'ggplot2' was built under R version 4.0.5
head(diamonds)
#> # A tibble: 6 x 10
#>   carat cut       color clarity depth table price     x     y     z
#>   <dbl> <ord>     <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl>
#> 1  0.23 Ideal     E     SI2      61.5    55   326  3.95  3.98  2.43
#> 2  0.21 Premium   E     SI1      59.8    61   326  3.89  3.84  2.31
#> 3  0.23 Good      E     VS1      56.9    65   327  4.05  4.07  2.31
#> 4  0.29 Premium   I     VS2      62.4    58   334  4.2   4.23  2.63
#> 5  0.31 Good      J     SI2      63.3    58   335  4.34  4.35  2.75
#> 6  0.24 Very Good J     VVS2     62.8    57   336  3.94  3.96  2.48


#Every field: table, price, x, y, dept is a column/variable
#with their respective values

#What I need is in my world_data dataframe to create a column for each series name
#in my ï..Series.Name and add their respective values without altering the years.

Created on 2021-08-17 by the reprex package (v2.0.1)

I think I get what you want. Can you do a reprex of your dataset? We can't read your csv.

It would be something like this anyway:


# like this, but use the names from your data
data %>% 
  select(country:year, series, value) %>% 
  pivot_wider(-(country, country_code, year), names_from = "series", values_from = "value")

The ~ï..Series.Name is the column/field where I have all the variables such as Adolescent fertility rate, public spending on education, and so on.

tibble::tribble(
                                                  ~ï..Series.Name, ~Country.Name, ~Country.Code,            ~Year,    ~value,
  "Adolescent fertility rate (births per 1,000 women ages 15-19)",   "Argentina",         "ARG", "X2006..YR2006.",  "63.162",
  "Adolescent fertility rate (births per 1,000 women ages 15-19)",   "Argentina",         "ARG", "X2007..YR2007.",  "62.938",
  "Adolescent fertility rate (births per 1,000 women ages 15-19)",   "Argentina",         "ARG", "X2008..YR2008.", "63.0638",
  "Adolescent fertility rate (births per 1,000 women ages 15-19)",   "Argentina",         "ARG", "X2009..YR2009.", "63.1896",
  "Adolescent fertility rate (births per 1,000 women ages 15-19)",   "Argentina",         "ARG", "X2010..YR2010.", "63.3154",
  "Adolescent fertility rate (births per 1,000 women ages 15-19)",   "Argentina",         "ARG", "X2011..YR2011.", "63.4412",
  "Adolescent fertility rate (births per 1,000 women ages 15-19)",   "Argentina",         "ARG", "X2012..YR2012.",  "63.567",
  "Adolescent fertility rate (births per 1,000 women ages 15-19)",   "Argentina",         "ARG", "X2013..YR2013.",   "63.41",
  "Adolescent fertility rate (births per 1,000 women ages 15-19)",   "Argentina",         "ARG", "X2014..YR2014.",  "63.253",
  "Adolescent fertility rate (births per 1,000 women ages 15-19)",   "Argentina",         "ARG", "X2015..YR2015.",  "63.096",
  "Adolescent fertility rate (births per 1,000 women ages 15-19)",   "Argentina",         "ARG", "X2016..YR2016.",  "62.939",
  "Adolescent fertility rate (births per 1,000 women ages 15-19)",   "Argentina",         "ARG", "X2017..YR2017.",  "62.782",
  "Adolescent fertility rate (births per 1,000 women ages 15-19)",   "Argentina",         "ARG", "X2018..YR2018.", "62.5654",
  "Adolescent fertility rate (births per 1,000 women ages 15-19)",   "Argentina",         "ARG", "X2019..YR2019.", "62.3488",
  "Adolescent fertility rate (births per 1,000 women ages 15-19)",   "Argentina",         "ARG", "X2020..YR2020.",      "..",
                                  "Age at first marriage, female",   "Argentina",         "ARG", "X2006..YR2006.",      "..",
                                  "Age at first marriage, female",   "Argentina",         "ARG", "X2007..YR2007.",      "..",
                                  "Age at first marriage, female",   "Argentina",         "ARG", "X2008..YR2008.",      "..",
                                  "Age at first marriage, female",   "Argentina",         "ARG", "X2009..YR2009.",      "..",
                                  "Age at first marriage, female",   "Argentina",         "ARG", "X2010..YR2010.",    "24.6"
  )
#> # A tibble: 20 x 5
#>    ï..Series.Name                      Country.Name Country.Code Year      value
#>    <chr>                               <chr>        <chr>        <chr>     <chr>
#>  1 Adolescent fertility rate (births ~ Argentina    ARG          X2006..Y~ 63.1~
#>  2 Adolescent fertility rate (births ~ Argentina    ARG          X2007..Y~ 62.9~
#>  3 Adolescent fertility rate (births ~ Argentina    ARG          X2008..Y~ 63.0~
#>  4 Adolescent fertility rate (births ~ Argentina    ARG          X2009..Y~ 63.1~
#>  5 Adolescent fertility rate (births ~ Argentina    ARG          X2010..Y~ 63.3~
#>  6 Adolescent fertility rate (births ~ Argentina    ARG          X2011..Y~ 63.4~
#>  7 Adolescent fertility rate (births ~ Argentina    ARG          X2012..Y~ 63.5~
#>  8 Adolescent fertility rate (births ~ Argentina    ARG          X2013..Y~ 63.41
#>  9 Adolescent fertility rate (births ~ Argentina    ARG          X2014..Y~ 63.2~
#> 10 Adolescent fertility rate (births ~ Argentina    ARG          X2015..Y~ 63.0~
#> 11 Adolescent fertility rate (births ~ Argentina    ARG          X2016..Y~ 62.9~
#> 12 Adolescent fertility rate (births ~ Argentina    ARG          X2017..Y~ 62.7~
#> 13 Adolescent fertility rate (births ~ Argentina    ARG          X2018..Y~ 62.5~
#> 14 Adolescent fertility rate (births ~ Argentina    ARG          X2019..Y~ 62.3~
#> 15 Adolescent fertility rate (births ~ Argentina    ARG          X2020..Y~ ..   
#> 16 Age at first marriage, female       Argentina    ARG          X2006..Y~ ..   
#> 17 Age at first marriage, female       Argentina    ARG          X2007..Y~ ..   
#> 18 Age at first marriage, female       Argentina    ARG          X2008..Y~ ..   
#> 19 Age at first marriage, female       Argentina    ARG          X2009..Y~ ..   
#> 20 Age at first marriage, female       Argentina    ARG          X2010..Y~ 24.6

Created on 2021-08-17 by the reprex package (v2.0.1)

So these rows are the ones I want to change from rows each into columns. (I have like 6 different data series or series names) so It would mean that I'd have 6 columns.

I tried your code and It's giving me an error.

Like this perhaps?

df %>% 
  select(Country.Name:Year, Series.Name = ï..Series.Name, value) %>% 
  pivot_wider(c(Country.Name, Country.Code, Year), names_from = "Series.Name", values_from = "value")


# A tibble: 15 x 5
   Country.Name Country.Code Year           `Adolescent fertility rate (births per 1,000 women ages 15-19)` `Age at first marriage, female`
   <chr>        <chr>        <chr>          <chr>                                                           <chr>                          
 1 Argentina    ARG          X2006..YR2006. 63.162                                                          ..                             
 2 Argentina    ARG          X2007..YR2007. 62.938                                                          ..                             
 3 Argentina    ARG          X2008..YR2008. 63.0638                                                         ..                             
 4 Argentina    ARG          X2009..YR2009. 63.1896                                                         ..                             
 5 Argentina    ARG          X2010..YR2010. 63.3154                                                         24.6                           
 6 Argentina    ARG          X2011..YR2011. 63.4412                                                         NA                             
 7 Argentina    ARG          X2012..YR2012. 63.567                                                          NA                             
 8 Argentina    ARG          X2013..YR2013. 63.41                                                           NA                             
 9 Argentina    ARG          X2014..YR2014. 63.253                                                          NA                             
10 Argentina    ARG          X2015..YR2015. 63.096                                                          NA                             
11 Argentina    ARG          X2016..YR2016. 62.939                                                          NA                             
12 Argentina    ARG          X2017..YR2017. 62.782                                                          NA                             
13 Argentina    ARG          X2018..YR2018. 62.5654                                                         NA                             
14 Argentina    ARG          X2019..YR2019. 62.3488                                                         NA                             
15 Argentina    ARG          X2020..YR2020. ..                                                              NA  
1 Like

Thank you very much man, you helped me a lot.

Apreciate it.

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.