Hi Posit Users.
I have a list of NPI numbers I'm querying from an API that pulls provider information:
https://npiregistry.cms.hhs.gov/api-page
I use workflow something like this for each pull:
library(httr)
library(jsonlite)
library(tidyverse)
out = fromJSON(rawToChar(GET("https://npiregistry.cms.hhs.gov/api/?version=2.1", query = list(number = 1992975965))$content), flatten = TRUE, simplifyDataFrame = TRUE)
Which generates a list output of an integer(result_count
) and a dataframe(results
), the latter of which being what I need.
However, this dataframe has quite a few list variables:
sapply(out$results,class)
created_epoch enumeration_type last_updated_epoch number addresses
"character" "character" "character" "character" "list"
practiceLocations taxonomies identifiers endpoints other_names
"list" "list" "list" "list" "list"
basic.first_name basic.last_name basic.middle_name basic.credential basic.sole_proprietor
"character" "character" "character" "character" "character"
basic.gender basic.enumeration_date basic.last_updated basic.status basic.name_prefix
"character" "character" "character" "character" "character"
basic.name_suffix
"character"
Which I would like to flatten as much as possible.
The following does not work due to what I assume is name recycling:
out$results |> unnest_longer(where(is.list))
Error in `unnest_longer()`:
! In row 1, can't recycle input of size 2 to size 0.
Run `rlang::last_trace()` to see where the error occurred.
pivot_longer
works and I don't mind the extra rows, but it has the unfortunate side effect of creating a variable called value
that is also a list
:
sapply(out$results |> pivot_longer(where(is.list)),class)
created_epoch enumeration_type last_updated_epoch number basic.first_name
"character" "character" "character" "character" "character"
basic.last_name basic.middle_name basic.credential basic.sole_proprietor basic.gender
"character" "character" "character" "character" "character"
basic.enumeration_date basic.last_updated basic.status basic.name_prefix basic.name_suffix
"character" "character" "character" "character" "character"
name value
"character" "list"
With this particular variable completely unable to be flattened despite my best efforts:
> out$results |> pivot_longer(where(is.list)) |> unnest_longer(where(is.list))
Error in `col_to_long()`:
! Can't combine `..1$value` <data.frame> and `..3$value` <list>.
Run `rlang::last_trace()` to see where the error occurred.
> out$results |> pivot_longer(where(is.list)) |> pivot_longer(where(is.list))
Error in `pivot_longer()`:
! Names must be unique.
✖ These names are duplicated:
* "name" at locations 16 and 17.
ℹ Use argument `names_repair` to specify repair strategy.
Run `rlang::last_trace()` to see where the error occurred.
> out$results |> pivot_longer(where(is.list)) |> unnest(where(is.list))
Error in `list_unchop()`:
! Can't combine `x[[1]]` <data.frame> and `x[[2]]` <list>.
Run `rlang::last_trace()` to see where the error occurred.
And while unpack
will run, the variable value
remains a list and seems to behave as if nothing happened:
> sapply(out$results |> pivot_longer(where(is.list)) |> unpack(where(is.list)),class)
created_epoch enumeration_type last_updated_epoch number basic.first_name
"character" "character" "character" "character" "character"
basic.last_name basic.middle_name basic.credential basic.sole_proprietor basic.gender
"character" "character" "character" "character" "character"
basic.enumeration_date basic.last_updated basic.status basic.name_prefix basic.name_suffix
"character" "character" "character" "character" "character"
name value
"character" "list"
Would anyone be able to tell me what I am doing incorrectly with this call?
Thank you in advance!