Change a list in a list column to a dataframe, transpose it and add it to the original dataframe

purrr

#1

I'm trying to learn the api for Kolada - the swedish mumicipalities database. I have managed to download data and get the data in a dataframe, but I have a list column with a list inside that I have problems with. I want to change the list to a dataframe and then transpose it and add it to the original dataframe, see example:

library(httr)
library(tidyverse)

# Download example from Kolada:
url <- "http://api.kolada.se/v2/data/kpi/N00945/municipality/1860"
lresult <- content(GET(url))

ltrans <- transpose(lresult$values)


df <- map_df(.x = ltrans, .f = simplify)


# # A tibble: 22 x 4
# kpi    municipality period values    
# <chr>  <chr>         <int> <list>    
#   1 N00945 1860           1996 <list [1]>
#   2 N00945 1860           1997 <list [1]>
#   3 N00945 1860           1998 <list [1]>
#   4 N00945 1860           1999 <list [1]>
#   5 N00945 1860           2000 <list [1]>
#   6 N00945 1860           2001 <list [1]>
#   7 N00945 1860           2002 <list [1]>
#   8 N00945 1860           2003 <list [1]>
#   9 N00945 1860           2004 <list [1]>
#   10 N00945 1860           2005 <list [1]>
#   # ... with 12 more rows

# > str(df)
# Classes ‘tbl_df’, ‘tbl’ and 'data.frame':	22 obs. of  4 variables:
#   $ kpi         : chr  "N00945" "N00945" "N00945" "N00945" ...
# $ municipality: chr  "1860" "1860" "1860" "1860" ...
# $ period      : int  1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 ...
# $ values      :List of 22
# ..$ :List of 1
# .. ..$ :List of 4
# .. .. ..$ count : int 1
# .. .. ..$ gender: chr "T"
# .. .. ..$ status: chr ""
# .. .. ..$ value : NULL
# ..$ :List of 1
# .. ..$ :List of 4
# .. .. ..$ count : int 1
# .. .. ..$ gender: chr "T"
# .. .. ..$ status: chr ""
# .. .. ..$ value : NULL

I want to change the list in the list column to a dataframe and then transpose it and add it to df. What I want is a tibble with 22 rows and the columns:

kpi
municipality
period
count
gender
status
value

There is problably a purrr solution, but everything I have tried have falild.


#2

Here is one option, which involves "unnesting" two times along with a mutate()/map() combo. The unnest() function is from package tidyr.

The first unnest() is to get rid of the top level of the nested list in values and instead have an unnested list containing the 4 values of interest.

Once that is done, that list column can be looped through via a map() loop within mutate() to first flatten the list via flatten() and bind it into a tibble with bind_rows().

Last, a final unnest() can be used to get that list of tibbles into the original dataset as new columns.

Here's what that looks like all together:

df %>%
     unnest(values) %>%
     mutate(values = map(values, ~flatten(.x) %>% bind_rows() ) ) %>%
     unnest(values)

# A tibble: 22 x 7
   kpi    municipality period count gender status value
   <chr>  <chr>         <int> <int> <chr>  <chr>  <dbl>
 1 N00945 1860           1996     1 T      ""      NA  
 2 N00945 1860           1997     1 T      ""      NA  
 3 N00945 1860           1998     1 T      ""      34.7
 4 N00945 1860           1999     1 T      ""      34.6
 5 N00945 1860           2000     1 T      ""      37.7
 6 N00945 1860           2001     1 T      ""      40.3
 7 N00945 1860           2002     1 T      ""      39.3
 8 N00945 1860           2003     1 T      ""      40.8
 9 N00945 1860           2004     1 T      ""      43.5
10 N00945 1860           2005     1 T      ""      44.9
# ... with 12 more rows

#3

When data comes in a nasty arrangement, the automatic processing done by jsonlite::fromJSON may not be the best choice. By looking at the basic list version, we can come up with a function to turn one element into a data row.

lresult <- content(GET(url), as = "parsed", simplifyDataFrame = FALSE)
str(lresult[["values"]][[1]])
# List of 4
#  $ kpi         : chr "N00945"
#  $ municipality: chr "1860"
#  $ period      : int 1996
#  $ values      :List of 1
#   ..$ :List of 4
#   .. ..$ count : int 1
#   .. ..$ gender: chr "T"
#   .. ..$ status: chr ""
#   .. ..$ value : NULL

So each observation is broken into two lists. We can make a data frame from each, then bind the columns together. We also have to convert the NULL to NA when it pops up.

process_value_list <- function(vlist) {
  basics  <- as_data_frame(vlist[c("kpi", "municipality", "period")])
  values <- vlist[["values"]][[1]]
  if (is.null(values[["value"]])) {
    values[["value"]] <- NA
  }
  details <- as_data_frame(values)
  cbind(basics, details)
}

process_value_list(lresult[["values"]][[1]])
#      kpi municipality period count gender status value
# 1 N00945         1860   1996     1      T           NA

Since that works, we can loop over the whole list and combine the results.

dframe <- lresult[["values"]] %>%
  lapply(process_value_list) %>%
  bind_rows()

dframe
#       kpi municipality period count gender status    value
# 1  N00945         1860   1996     1      T              NA
# 2  N00945         1860   1997     1      T              NA
# 3  N00945         1860   1998     1      T        34.68172
# 4  N00945         1860   1999     1      T        34.59331
# 5  N00945         1860   2000     1      T        37.69174
# 6  N00945         1860   2001     1      T        40.26130
# ...

#4

Thank you very much for two great solutions. You saved my day! Perhaps the purrr solution is a bit less specific than nwerts solution and can be more easily adapted to other querries to the api. . On the other hand nwerts solution means less data wrangling and is maybe faster.