Here is the link to my rmarkdown that explains this whole thing: RPubs - bind_rows() and JSON data
Topic:
When confronted with NULL
values or list()
values present in JSON data converted to a list of lists by httr2, bind_rows()
fails resulting in empty tibbles and dropped rows/columns with no warnings or error messages.
This is a deeper dive into a similar (unanswered) post here:
This example data object is based on my actual use case, retrieved using httr2
and resp_body_json()
.
Sample data:
some_list <- list(
list(
id = 001,
name = "bob",
age = 23,
country = NULL,
items = list("apple", "banana", "pear")
),
list(
id = 002,
name = "sam",
age = NULL,
country = NULL,
items = list()
),
list(
id = 003,
name = "joe",
age = NULL,
country = NULL,
items = list()
)
)
Desired (Tidy) Result:
# A tibble: 5 × 5
id name age country items
<dbl> <chr> <dbl> <lgl> <chr>
1 1 bob 23 NA apple
2 1 bob 23 NA banana
3 1 bob 23 NA pear
4 2 sam NA NA NA
5 3 joe NA NA NA
Code required to achieve result - in my opinion this is excessive and should not be necessary to prevent dropped rows and columns:
some_list |>
purrr::map( \(sub) purrr::map(sub, \(i) if (length(i) == 0) NA else i) ) |>
dplyr::bind_rows() |>
dplyr::mutate(items = purrr::map(items, \(i) if (is.null(i)) NA else i)) |>
tidyr::unnest(items)
Summary copy/pasted from my rmarkdown post:
bind_rows
should not return an empty tibble when alist()
column is presentbind_rows
should not drop columns where all areNULL
bind_rows
should not replaceNA
withNULL
unnest()
should not drop rows where val isNULL
inunnest(val)
- Most importantly, if none of these behaviors can change, a warning message should be posted where columns / rows are dropped as a result.
At the very least, allowing a fill
argument to bind_rows()
would bring its functionality very close to rbindlist()
for working with JSON data that have been converted to R list structures.
How are you guys handling JSON data within the tidyverse? Are these expectations unreasonable?
I suppose the argument could be made that the resp_body_json function from httr2 should have an option to replace NULL with NA, knowing that JSON allows for NULL (empty), but not NA (missing).