tidyr Rectangling Vignette Example using json data

zlb · October 26, 2021, 7:26pm

I'm going through and finding that the Game of Thrones example in the tidyr Rectangling Vignette is not working. Which is a shame. Maybe you can help?

There is an example of extracting data that I'm unable to replicate. This is the expected behavior, using unnest_longer to unpack titles for all GoT characters:

chars2 %>% 
  select(name, title = titles) %>% 
  unnest_longer(title)
#> # A tibble: 60 × 2
#>    name              title                                                 
#>    <chr>             <chr>                                                 
#>  1 Theon Greyjoy     "Prince of Winterfell"                                
#>  2 Theon Greyjoy     "Captain of Sea ■■■■■"                                
#>  3 Theon Greyjoy     "Lord of the Iron Islands (by law of the green lands)"
#>  4 Tyrion Lannister  "Acting Hand of the King (former)"                    
#>  5 Tyrion Lannister  "Master of Coin (former)"                             
#>  6 Victarion Greyjoy "Lord Captain of the Iron Fleet"                      
#>  7 Victarion Greyjoy "Master of the Iron Victory"                          
#>  8 Will              ""                                                    
#>  9 Areo Hotah        "Captain of the Guard at Sunspear"                    
#> 10 Chett             ""                                                    
#> # … with 50 more rows

However, when running the code, I get:

Error: Can't combine `..1$title` <list> and `..4$title` <character>

The error seems to be due to the JSON structure, where the fourth row is a chr[1] while the others are lists., because the JSON does not wrap all instances of titles in arrays.

chars2 %>% 
+   select(name, title = titles)
#> # A tibble: 30 × 2
#>    name               title     
#>  <chr>              <list>    
#>  1 Theon Greyjoy      <list [3]>
#>  2 Tyrion Lannister   <list [2]>
#>  3 Victarion Greyjoy  <list [2]>
#>  4 Will               <chr [1]> 
#>  5 Areo Hotah         <chr [1]> 
#>  6 Chett              <chr [1]> 
#>  7 Cressen            <chr [1]> 
#>  8 Arianne Martell    <chr [1]> 
#>  9 Daenerys Targaryen <list [5]>
#> 10 Davos Seaworth     <list [4]>
#> # … with 20 more rows

Is there a way to get this example working again? Much appreciated.

TODO

Rerun GoT examples from tidyr rectangling vignette. Do you get an error?
How can rectangling methods handle JSON fields that are represented both as empty strings, single strings, and arrays of strings?

williaml · October 26, 2021, 9:04pm

Hi. It seems to work fine for me. What version of tidyr do you have installed?

packageVersion("tidyr")

zlb · October 27, 2021, 5:09pm

You're right, it works, but I've found my REAL problem : reading in JSON versus using the tutorial example leads to different structures; how can read_json lead to the same structure as the tutorial?

Please see the attached script for an example. Thanks!

Details

I was reading in the data using jsonlite::read_json(repurrrsive::got_chars_json()), in an effort to see how the data would look in the wild, preserving it in a nested-list state. However, when I do this, the nested data becomes lists of 1-length character vectors, whereas the tutorial data in got_chars has them as unnested character vectors (shows up as nested dropdowns when using View(), see attached script).

I understand that using simplifyVector =TRUE or fromJSON will simplify the raw data into vectors and it'll work, but that is not the same structure the got_chars is in the tutorial. Using the simplify methods leads directly to a dataframe, while got_chars is still a nested list, but with flat character vectors for nested fields.

How can we go from the raw nested version of read_json to the structure we see in the tutorial with got_chars?

TODO

I'd really like to see an example using read_json, where we can use unnest_longer on fields.
I understand simplifyVector = TRUE or using fromJSON can get around this, bypassing the unnest_wider and creating a dataframe directly from the output, but what if we don't simplify?
How can we get from input_read_json to the format in input_tutorial (got_users)?

library(tidyverse)
library(repurrrsive)
library(jsonlite)

# Compare Input Types

input_tutorial <- repurrrsive::got_chars
input_read_json <- read_json(repurrrsive::got_chars_json()) # read json in raw


tutorial_example <- function(input) {
  chars <- tibble(char = input)
  chars

  chars2 <- chars %>% unnest_wider(char)
  chars2

  chars2 %>%
    select(name, books, tvSeries) %>%
    pivot_longer(c(books, tvSeries), names_to = "media", values_to = "value") %>%
    unnest_longer(value)
}

# tutorial works, read_json fails
tutorial_example(input_tutorial) # works
tutorial_example(input_read_json) # fails
#>  Error: Can't combine `..1$value` <list> and `..6$value` <character>.

#' seems to be due to structural differences
#' for the books and tvSeries fields
View(input_tutorial[[1]]) # they are character vectors
View(input_read_json[[1]]) # they are lists of 1-length character vectors

## TODO
#   How can we get from `input_read_json` to the format in input_tutorial (`got_users`)?

nirgrahamuk · October 28, 2021, 9:48am

Here is a first attempt hopefully someone has a better more elegant solution.

library(tidyverse)
library(repurrrsive)
library(jsonlite)

# Compare Input Types

input_tutorial <- repurrrsive::got_chars
input_read_json <- read_json(repurrrsive::got_chars_json()) # read json in raw


part_1 <- function(input) {
  chars <- tibble(char = input)
  chars
  
  chars2 <- chars %>% unnest_wider(char)
  chars2
}
part_2 <- function(input) {
  input %>%
    select(name, books, tvSeries) %>%
    pivot_longer(c(books, tvSeries), names_to = "media", values_to = "value") %>%
    unnest_longer(value)
}


(it_p1 <- part_1(input_tutorial) )
(irj_p1 <- part_1(input_read_json) )

# tutorial works, read_json fails
(it_p2 <- part_2(it_p1))
(irj_p2 <- part_2(irj_p1))



irj_p2 <- map(irj_p1,~{
  y <- NULL
  x <- .
  if(any(map_lgl(x,is.list))){
    # cat("found a list involving ",sort(unique(map_chr(x,~paste0(class(.))))),"\n")
   y <- map(x,paste0)
  }
  if(!is.null(y)){
    result <- y
  } else
    result <- x
  result 
  }) %>% as_tibble() %>% part_2()

waldo::compare(it_p2,irj_p2)
#√ No differences

system · November 18, 2021, 9:49am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.