Merge nested lists from iterative API response from httr2

canus · March 13, 2025, 12:12am

I'm having trouble getting useful output from my iterative API request (created with httr2). I am new to API and last year I was able to create 2 datasets from an API query via an R notebook: R Notebook 2024: NWOpen-API (the code is commented in English inside the chunks, but outside the chunks I use Dutch). Basically, I flipped to the next page manually, processed the response and saved the result (after which I merged the files into the 2 datasets. I was aware that it could be done via httr2, but due to time constraints, I was unable to make it work and see that solution at the time.
I have made some progress and can now query the API and get the response from all pages in a list, but I am unable to effectively use the response.

I might be doing several things wrong or missing an obvious solution, but I would like to bind the nested lists, but as these are lists and not a data frame my attempts failed. Converting the lists to a data frame didn't work, so I can use the same process I used last year to create the 2 datasets.

I tried to merge them in r using different methods (I think because the lists have the same name I couldn't bind them by using "append") or found searching the World Wide Web. The current state of this r notebook can be found at R Notebook 2025: NWOpen-API
And here is the reprex (sorry for the length it is the first time using the packadge):

# Packages needed
  package.list <- c("tidyverse", "rmarkdown", "httr2", "reprex",
+                     "jsonlite", "vctrs", "openxlsx2")


# Install missing packages 
  new.packages <- package.list[!(package.list %in% installed.packages()[,"Package"])]
  if(length(new.packages)) install.packages(new.packages)

  
# Load packages needed (and subdue)
  invisible(lapply(package.list, library, character.only = TRUE))
── Attaching core tidyverse packages ───────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.0.2     ── Conflicts ─────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package to force all conflicts to become errors
Attaching package: ‘jsonlite’

The following object is masked from ‘package:purrr’:

    flatten


Attaching package: ‘vctrs’

The following object is masked from ‘package:dplyr’:

    data_frame

The following object is masked from ‘package:tibble’:

    data_frame



  
# URL NWOpen-API
  NWOpen_base_url <-"https://nwopen-api.nwo.nl/NWOpen-API/api/Projects"

# Iteration request
req <- request(NWOpen_base_url) |>
+   req_url_query(
+       rs_start_date = "2023-01-01",
+       re_start_date = "2024-12-31",
+       organisation = "\"Universiteit Leiden\"",
+     )

NWO_resps <- req_perform_iterative(
+   req,
+   next_req = iterate_with_offset(
+     "page",
+     resp_pages = function(resp) resp_body_json(resp)$meta$pages
+   ),
+   max_reqs = Inf
+ )
                                                           


# Get response JSON body
  NWO_response <- NWO_resps |>
+     resps_successes() |>
+     resps_data(\(resp) resp_body_json(resp))

  
# Get projects list from response data
  NWO_responses_projects <- NWO_response[str_detect(names(NWO_response), "projects")]


reprex:::reprex_addin()
#> Error in parse(text = input): <text>:13:1: unexpected invalid token
#> 12:   invisible(lapply(package.list, library, character.only = TRUE))
#> 13: ─
#>     ^

^{Created on 2025-03-12 with reprex v2.1.1}

Session info

sessionInfo()
#> R version 4.4.1 (2024-06-14)
#> Platform: x86_64-apple-darwin20
#> Running under: macOS Monterey 12.7.6
#> 
#> Matrix products: default
#> BLAS:   /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRblas.0.dylib 
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0
#> 
#> locale:
#> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#> 
#> time zone: Europe/Amsterdam
#> tzcode source: internal
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> loaded via a namespace (and not attached):
#>  [1] digest_0.6.37     fastmap_1.2.0     xfun_0.49         glue_1.8.0       
#>  [5] knitr_1.49        htmltools_0.5.8.1 rmarkdown_2.29    lifecycle_1.0.4  
#>  [9] cli_3.6.3         reprex_2.1.1      withr_3.0.2       compiler_4.4.1   
#> [13] rstudioapi_0.17.1 tools_4.4.1       evaluate_1.0.1    yaml_2.3.10      
#> [17] rlang_1.1.4       fs_1.6.5

Can someone point me in the right direction to bind the nested lists with the same name, or maybe the iterative request needs to be amended?

Thank you for reading.

margusl · March 13, 2025, 9:18am

That's what resps_data() does, you just need to provide it a function that extracts what you actually need, e.g. projects list. For example, try changing resp_data function to :

 \(resp) resp_body_json(resp)$projects

library(httr2)
library(jsonlite)

NWOpen_base_url <-"https://nwopen-api.nwo.nl/NWOpen-API/api/Projects"

NWO_responses_projects <-
  request(NWOpen_base_url) |>
  req_url_query(
    rs_start_date = "2023-01-01",
    re_start_date = "2024-12-31",
    organisation = "\"Universiteit Leiden\"",
    # reduce repsone length for testing
    per_page = 10
  ) |>
  req_perform_iterative(
    next_req = iterate_with_offset(
      "page",
      resp_pages = function(resp) resp_body_json(resp)$meta$pages
    ),
    # limit max requests for testing
    max_reqs = 3 # Inf
  ) |>
  resps_successes() |>
  resps_data(\(resp) resp_body_json(resp)$projects)

Resulting list combines all items of projects from all responses:

lobstr::tree(NWO_responses_projects, max_depth = 3, max_length = 30)
#> <list>
#> ├─<list>
#> │ ├─project_id: "040.11.751"
#> │ ├─title: "From local to global in  Banach..."
#> │ ├─funding_scheme_id: 4134
#> │ ├─funding_scheme: "Bezoekersbeurs Bezoekersbeurs 20..."
#> │ ├─department: "Sociale en Geesteswetenschappen"
#> │ ├─sub_department: "Sociale en Geesteswetenschappen"
#> │ ├─start_date: "2024-01-15T00:00:00"
#> │ ├─summary_nl: "Tingley's problem is the followi..."
#> │ ├─summary_en: "Tingley's problem is the followi..."
#> │ └─project_members: <list>
#> │   ├─<list>...
#> │   └─<list>...
#> ├─<list>
#> │ ├─project_id: "VI.Veni.211F.084"
#> │ ├─title: "Places to not Forget: De-Silenci..."
#> │ ├─funding_scheme_id: 4330
#> │ ├─funding_scheme: "NWO-Talentprogramma Veni SGW 202..."
#> │ ├─department: "Sociale en Geesteswetenschappen"
#> │ ├─sub_department: "Sociale en Geesteswetenschappen"
#> │ ├─start_date: "2023-10-01T00:00:00"
#> │ ├─summary_nl: "Dit project onderzoekt hedendaag..."
#> │ ├─summary_en: "This project investigates archae..."
#> │ └─project_members: <list>
#> │   ├─<list>...
#> │   ├─<list>...
#> │   └─<list>...
#> ├─<list>
#> │ ├─project_id: "19716"
#> ...

Or you could wrap whole response body in list(), extract projects from every top-level item and then combine, something like:

... |>
  resps_successes() |>
  resps_data(\(resp) list(resp_body_json(resp))) |> 
  purrr::map("projects") |> 
  purrr::list_c()

canus · March 13, 2025, 11:44am

Thank you very much margusl for your help! Both in answering (and solving) my question, amending my query (helping me understand the possibilities better) and reducing the response time (I did not think to adjust the "responses per page" just the dates of the query, and this is very insightful).

Your first suggestion solved my question, but I would eventually also like to explore if the second option would work ( for those who might read this in the future the R Notebook 2025: NWOpen-API (linked above) will eventually contain an exploration of both options).

margusl · March 13, 2025, 11:54am

Note that I only fiddled with per_page & max_reqs to generate fewer requests and smaller responses for more convenient testing. When collecting a complete set, I wouldn't expect it to be (considerably) faster than default 100 results / resp.

system · March 20, 2025, 11:55am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.