Consider this very simple tibble
tibble(horrible_python_dict = "{\"Java\":1,\"Stata\":2,\"Texas Instrument\":13,\"Pearl\":900,\"GameBoy\":2}")
# A tibble: 1 x 1
horrible_python_dict
<chr>
1 "{\"Java\":1,\"Stata\":2,\"Texas Instrument\":13,\"Pearl\":900,\"GameBoy\":2}"
As you can see, my tibble
contains a good-ol python
dictionary encoded as a string!
How can I convert this horrible string to a nice looking R list-column
? That is, the ideal output should be
tibble(better_list_col = list(list('Java' = 1,
'Stata' = 2,
'Texas Instrument' = 12,
'Pearl' = 900,
'GameBoy' = 2)))
# A tibble: 1 x 1
better_list_col
<list>
1 <list [5]>
Do I necessarily have to use some regex
magic here or there is something simpler?
Thanks!
The python syntax is close enough to JSON that you can use a JSON parser in many cases.
str <- "{\"Java\":1,\"Stata\":2,\"Texas Instrument\":13,\"Pearl\":900,\"GameBoy\":2}"
jsonlite::parse_json(str)
#> $Java
#> [1] 1
#>
#> $Stata
#> [1] 2
#>
#> $`Texas Instrument`
#> [1] 13
#>
#> $Pearl
#> [1] 900
#>
#> $GameBoy
#> [1] 2
d <- data.frame(values = I(list("{\"Java\":1,\"Stata\":2,\"Texas Instrument\":13,\"Pearl\":900,\"GameBoy\":2}")),
stringsAsFactors = FALSE)
d$values <- lapply(d$values, jsonlite::parse_json)
d$values[[1]]
#> $Java
#> [1] 1
#>
#> $Stata
#> [1] 2
#>
#> $`Texas Instrument`
#> [1] 13
#>
#> $Pearl
#> [1] 900
#>
#> $GameBoy
#> [1] 2
1 Like
ha! well done! I am glad I asked because I was already in regex hell thanks!!!
Only issue is that it seems I cannot use it inside a tibble directly
tibble(horrible_python_dict = "{\"Java\":1,\"Stata\":2,\"Texas Instrument\":13,\"Pearl\":900,\"GameBoy\":2}") %>%
mutate(list_col = jsonlite::parse_json(horrible_python_dict))
Error: Column `list_col` must be length 1 (the number of rows), not 5
OK got it @JohnMount . A simple variant with purrr
did the job. Thanks!
tibble(horrible_python_dict = c("{\"Java\":1,\"Stata\":2,\"Texas Instrument\":13,\"Pearl\":900,\"GameBoy\":2}",
"{\"John\":1,\"Mount\":2}")) %>%
mutate(list_col = map(horrible_python_dict, ~jsonlite::parse_json(.x)))
It is a case where dplyr isn't making things easier than base-R. But here is an example.
d <- data.frame(python_str = "{\"Java\":1,\"Stata\":2,\"Texas Instrument\":13,\"Pearl\":900,\"GameBoy\":2}",
stringsAsFactors = FALSE)
d$list_col <- lapply(d$python_str, jsonlite::parse_json)
d$list_col[[1]]
#> $Java
#> [1] 1
#>
#> $Stata
#> [1] 2
#>
#> $`Texas Instrument`
#> [1] 13
#>
#> $Pearl
#> [1] 900
#>
#> $GameBoy
#> [1] 2
library("dplyr")
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
d <- data.frame(python_str = "{\"Java\":1,\"Stata\":2,\"Texas Instrument\":13,\"Pearl\":900,\"GameBoy\":2}",
stringsAsFactors = FALSE)
d <- d %>% mutate(list_col = lapply(python_str, jsonlite::parse_json))
d$list_col[[1]]
#> $Java
#> [1] 1
#>
#> $Stata
#> [1] 2
#>
#> $`Texas Instrument`
#> [1] 13
#>
#> $Pearl
#> [1] 900
#>
#> $GameBoy
#> [1] 2
Ah, crossed messages in typing. Sounds like you got it!
1 Like
system
Closed
May 29, 2019, 6:32pm
8
This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.