how to convert python dictionaries to proper list-columns in a tibble?

Consider this very simple tibble


tibble(horrible_python_dict = "{\"Java\":1,\"Stata\":2,\"Texas Instrument\":13,\"Pearl\":900,\"GameBoy\":2}")
# A tibble: 1 x 1
  horrible_python_dict                                                          
  <chr>                                                                         
1 "{\"Java\":1,\"Stata\":2,\"Texas Instrument\":13,\"Pearl\":900,\"GameBoy\":2}"

As you can see, my tibble contains a good-ol python dictionary encoded as a string!

How can I convert this horrible string to a nice looking R list-column? That is, the ideal output should be


tibble(better_list_col = list(list('Java' = 1,
                              'Stata' = 2,
                              'Texas Instrument' = 12,
                              'Pearl' = 900,
                              'GameBoy' = 2)))
# A tibble: 1 x 1
  better_list_col
  <list>         
1 <list [5]>  

Do I necessarily have to use some regex magic here or there is something simpler?

Thanks!

The python syntax is close enough to JSON that you can use a JSON parser in many cases.

str <- "{\"Java\":1,\"Stata\":2,\"Texas Instrument\":13,\"Pearl\":900,\"GameBoy\":2}"
jsonlite::parse_json(str)
#> $Java
#> [1] 1
#> 
#> $Stata
#> [1] 2
#> 
#> $`Texas Instrument`
#> [1] 13
#> 
#> $Pearl
#> [1] 900
#> 
#> $GameBoy
#> [1] 2
d <- data.frame(values = I(list("{\"Java\":1,\"Stata\":2,\"Texas Instrument\":13,\"Pearl\":900,\"GameBoy\":2}")),
                stringsAsFactors = FALSE)
d$values <- lapply(d$values, jsonlite::parse_json)
d$values[[1]]
#> $Java
#> [1] 1
#> 
#> $Stata
#> [1] 2
#> 
#> $`Texas Instrument`
#> [1] 13
#> 
#> $Pearl
#> [1] 900
#> 
#> $GameBoy
#> [1] 2
1 Like

ha! well done! I am glad I asked because I was already in regex hell :slight_smile: thanks!!!

Only issue is that it seems I cannot use it inside a tibble directly


tibble(horrible_python_dict = "{\"Java\":1,\"Stata\":2,\"Texas Instrument\":13,\"Pearl\":900,\"GameBoy\":2}") %>% 
  mutate(list_col = jsonlite::parse_json(horrible_python_dict))
Error: Column `list_col` must be length 1 (the number of rows), not 5

OK got it @JohnMount. A simple variant with purrr did the job. Thanks!

tibble(horrible_python_dict = c("{\"Java\":1,\"Stata\":2,\"Texas Instrument\":13,\"Pearl\":900,\"GameBoy\":2}",
                                "{\"John\":1,\"Mount\":2}")) %>% 
  mutate(list_col = map(horrible_python_dict, ~jsonlite::parse_json(.x)))

It is a case where dplyr isn't making things easier than base-R. But here is an example.

d <- data.frame(python_str = "{\"Java\":1,\"Stata\":2,\"Texas Instrument\":13,\"Pearl\":900,\"GameBoy\":2}",
                stringsAsFactors = FALSE)
d$list_col <- lapply(d$python_str, jsonlite::parse_json)
d$list_col[[1]]
#> $Java
#> [1] 1
#> 
#> $Stata
#> [1] 2
#> 
#> $`Texas Instrument`
#> [1] 13
#> 
#> $Pearl
#> [1] 900
#> 
#> $GameBoy
#> [1] 2

library("dplyr")
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
d <- data.frame(python_str = "{\"Java\":1,\"Stata\":2,\"Texas Instrument\":13,\"Pearl\":900,\"GameBoy\":2}",
                stringsAsFactors = FALSE)
d <- d %>% mutate(list_col = lapply(python_str, jsonlite::parse_json))
d$list_col[[1]]
#> $Java
#> [1] 1
#> 
#> $Stata
#> [1] 2
#> 
#> $`Texas Instrument`
#> [1] 13
#> 
#> $Pearl
#> [1] 900
#> 
#> $GameBoy
#> [1] 2

Ah, crossed messages in typing. Sounds like you got it!

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.