I'm trying to learn the api for Kolada - the swedish mumicipalities database. I have managed to download data and get the data in a dataframe, but I have a list column with a list inside that I have problems with. I want to change the list to a dataframe and then transpose it and add it to the original dataframe, see example:
I want to change the list in the list column to a dataframe and then transpose it and add it to df. What I want is a tibble with 22 rows and the columns:
kpi
municipality
period
count
gender
status
value
There is problably a purrr solution, but everything I have tried have falild.
Here is one option, which involves "unnesting" two times along with a mutate()/map() combo. The unnest() function is from package tidyr.
The first unnest() is to get rid of the top level of the nested list in values and instead have an unnested list containing the 4 values of interest.
Once that is done, that list column can be looped through via a map() loop within mutate() to first flatten the list via flatten() and bind it into a tibble with bind_rows().
Last, a final unnest() can be used to get that list of tibbles into the original dataset as new columns.
Here's what that looks like all together:
df %>%
unnest(values) %>%
mutate(values = map(values, ~flatten(.x) %>% bind_rows() ) ) %>%
unnest(values)
# A tibble: 22 x 7
kpi municipality period count gender status value
<chr> <chr> <int> <int> <chr> <chr> <dbl>
1 N00945 1860 1996 1 T "" NA
2 N00945 1860 1997 1 T "" NA
3 N00945 1860 1998 1 T "" 34.7
4 N00945 1860 1999 1 T "" 34.6
5 N00945 1860 2000 1 T "" 37.7
6 N00945 1860 2001 1 T "" 40.3
7 N00945 1860 2002 1 T "" 39.3
8 N00945 1860 2003 1 T "" 40.8
9 N00945 1860 2004 1 T "" 43.5
10 N00945 1860 2005 1 T "" 44.9
# ... with 12 more rows
When data comes in a nasty arrangement, the automatic processing done by jsonlite::fromJSON may not be the best choice. By looking at the basic list version, we can come up with a function to turn one element into a data row.
lresult <- content(GET(url), as = "parsed", simplifyDataFrame = FALSE)
str(lresult[["values"]][[1]])
# List of 4
# $ kpi : chr "N00945"
# $ municipality: chr "1860"
# $ period : int 1996
# $ values :List of 1
# ..$ :List of 4
# .. ..$ count : int 1
# .. ..$ gender: chr "T"
# .. ..$ status: chr ""
# .. ..$ value : NULL
So each observation is broken into two lists. We can make a data frame from each, then bind the columns together. We also have to convert the NULL to NA when it pops up.
process_value_list <- function(vlist) {
basics <- as_data_frame(vlist[c("kpi", "municipality", "period")])
values <- vlist[["values"]][[1]]
if (is.null(values[["value"]])) {
values[["value"]] <- NA
}
details <- as_data_frame(values)
cbind(basics, details)
}
process_value_list(lresult[["values"]][[1]])
# kpi municipality period count gender status value
# 1 N00945 1860 1996 1 T NA
Since that works, we can loop over the whole list and combine the results.
dframe <- lresult[["values"]] %>%
lapply(process_value_list) %>%
bind_rows()
dframe
# kpi municipality period count gender status value
# 1 N00945 1860 1996 1 T NA
# 2 N00945 1860 1997 1 T NA
# 3 N00945 1860 1998 1 T 34.68172
# 4 N00945 1860 1999 1 T 34.59331
# 5 N00945 1860 2000 1 T 37.69174
# 6 N00945 1860 2001 1 T 40.26130
# ...
Thank you very much for two great solutions. You saved my day! Perhaps the purrr solution is a bit less specific than nwerts solution and can be more easily adapted to other querries to the api. . On the other hand nwerts solution means less data wrangling and is maybe faster.