Hi there, I have an unwieldy dataframe in which one column is a nested list of unknown depth (the example I've included only goes two levels deep, but the real data sometimes go further).
I'd like to be able to map the key:value pairs from all levels in the nested list into columns, where each unique key is a new column. To further complicate things, not all entries have all keys, so I need to be able to introduce NAs.
library(tidyverse)
library(jsonlite)
t <- 'id|json\n
1|{"latitude":"40.757","locationDetails":{"addressDetails":"2 sesame street","contactName":"ernie","phone":"987654321"},"longitude":"-73.924"}\n
2|{"city":"new york city","latitude":"40.756","locationDetails":{"addressDetails":"1 sesame street","contactName":"bert","phone":"123456"},"longitude":"-73.925","neighborhood":"astoria"}'
df <- read_delim(
t,
delim = '|'
)
df$json_list <- purrr::map(df$json, jsonlite::fromJSON, simplifyDataFrame = T)
If I do df$json_df <- purrr::map_dfr(df$json_list, magrittr::extract, c('latitude', 'longitude'))
, I can extract values from the first level, but that only works on the first level where data is present. If I try to include a key with NA
in some places, like city
, a fatal error is produced and the session is killed. Similarly, if I try to include keys from the second layer via locationDetails
or locationDetails.name
, a fatal error is produced and the session is killed, so I assume some form of layering of map_*
is needed, but haven't been able to figure it out. Please help!
The desired format:
df_desired_format <- data.frame(
id = c(1,2),
city = c(NA, 'new york city'),
latitude = c(40.757,40.756),
addressDetails = c('2 sesame street', '1 sesame street'),
contactName = c('ernie', 'bert'),
phone = c(987654321, 123456),
longitude = c(-73.924,-73.925),
neighbordhood = c(NA, 'astoria')
)
Thanks in advance!