There is an example of extracting data that I'm unable to replicate. This is the expected behavior, using unnest_longer to unpack titles for all GoT characters:
chars2 %>%
select(name, title = titles) %>%
unnest_longer(title)
#> # A tibble: 60 × 2
#> name title
#> <chr> <chr>
#> 1 Theon Greyjoy "Prince of Winterfell"
#> 2 Theon Greyjoy "Captain of Sea ■■■■■"
#> 3 Theon Greyjoy "Lord of the Iron Islands (by law of the green lands)"
#> 4 Tyrion Lannister "Acting Hand of the King (former)"
#> 5 Tyrion Lannister "Master of Coin (former)"
#> 6 Victarion Greyjoy "Lord Captain of the Iron Fleet"
#> 7 Victarion Greyjoy "Master of the Iron Victory"
#> 8 Will ""
#> 9 Areo Hotah "Captain of the Guard at Sunspear"
#> 10 Chett ""
#> # … with 50 more rows
However, when running the code, I get:
Error: Can't combine `..1$title` <list> and `..4$title` <character>
The error seems to be due to the JSON structure, where the fourth row is a chr[1] while the others are lists., because the JSON does not wrap all instances of titles in arrays.
You're right, it works, but I've found my REAL problem : reading in JSON versus using the tutorial example leads to different structures; how can read_json lead to the same structure as the tutorial?
Please see the attached script for an example. Thanks!
Details
I was reading in the data using jsonlite::read_json(repurrrsive::got_chars_json()), in an effort to see how the data would look in the wild, preserving it in a nested-list state. However, when I do this, the nested data becomes lists of 1-length character vectors, whereas the tutorial data in got_chars has them as unnested character vectors (shows up as nested dropdowns when using View(), see attached script).
I understand that using simplifyVector =TRUE or fromJSON will simplify the raw data into vectors and it'll work, but that is not the same structure the got_chars is in the tutorial. Using the simplify methods leads directly to a dataframe, while got_chars is still a nested list, but with flat character vectors for nested fields.
How can we go from the raw nested version of read_json to the structure we see in the tutorial with got_chars?
TODO
I'd really like to see an example using read_json, where we can use unnest_longer on fields.
I understand simplifyVector = TRUE or using fromJSON can get around this, bypassing the unnest_wider and creating a dataframe directly from the output, but what if we don't simplify?
How can we get from input_read_json to the format in input_tutorial (got_users)?
library(tidyverse)
library(repurrrsive)
library(jsonlite)
# Compare Input Types
input_tutorial <- repurrrsive::got_chars
input_read_json <- read_json(repurrrsive::got_chars_json()) # read json in raw
tutorial_example <- function(input) {
chars <- tibble(char = input)
chars
chars2 <- chars %>% unnest_wider(char)
chars2
chars2 %>%
select(name, books, tvSeries) %>%
pivot_longer(c(books, tvSeries), names_to = "media", values_to = "value") %>%
unnest_longer(value)
}
# tutorial works, read_json fails
tutorial_example(input_tutorial) # works
tutorial_example(input_read_json) # fails
#> Error: Can't combine `..1$value` <list> and `..6$value` <character>.
#' seems to be due to structural differences
#' for the books and tvSeries fields
View(input_tutorial[[1]]) # they are character vectors
View(input_read_json[[1]]) # they are lists of 1-length character vectors
## TODO
# How can we get from `input_read_json` to the format in input_tutorial (`got_users`)?