How to use API in R?

melgoussi · July 12, 2024, 9:46pm

I really do not know how to use httr to get PIK data (bulk download) database from this website
https://www.climatewatchdata.org/data-explorer/historical-emissions?historical-emissions-data-sources=pik&historical-emissions-gases=&historical-emissions-regions=&historical-emissions-sectors=&page=1

res = httr::GET("https://www.climatewatchdata.org/api/v1/data/historical_emissions?historical-emissions-data-sources=215")
data = jsonlite::fromJSON(rawToChar(res$content))

vedoa · July 13, 2024, 5:21am

Hi @melgoussi

you were almost there

res = httr::GET("https://www.climatewatchdata.org/api/v1/data/historical_emissions?historical-emissions-data-sources=215")
data = httr::content(res)

Additional information:

httr will recognise that it is json and call jsonlite under the hood.
There is also the httr2 package which has a more modern api so you could check it out too (same author as httr)

melgoussi · July 13, 2024, 9:07am

Hi vedoa,
It does not grab all the data......

vedoa · July 13, 2024, 10:56am

Well then you have to do

library(httr)
# call api
res = httr::GET("https://www.climatewatchdata.org/api/v1/data/historical_emissions?historical-emissions-data-sources=215")
# extract content
data = httr::content(res)

# understand api and realize that the header holds information on all available results
links <- trimws(unlist(strsplit(res$headers$link, split = ",")))

# helper  function to extract proper links
get_link <- function(x){
  # find everything between "<" and ">"
  pattern <- "<(.*?)>"
  result <- regmatches(x, regexec(pattern, x))
  # if nothing is found return NA
  if(length(result) == 0) return(NA)
  result[[1]][2]
}

# helper function to make next call
append_new_results <- function(data, res){
  # extract links from header
  links <- trimws(unlist(strsplit(res$headers$link, split = ",")))
  # get last (if exists)
  linkLast <- get_link(links[grepl(x = links, pattern = "rel=\"last\"")])
  # nothing found then we are done
  if(is.na(linkLast)){
    # no more next links
    return(data)
  }
  # get next link
  linkNext <- get_link(links[grepl(x = links, pattern = "rel=\"next\"")])
  # something to look at :D 
  print(paste0("calling ", linkNext))
  # call api with next link
  res <- httr::GET(linkNext)
  # extract data
  dataTmp <- httr::content(res)
  # append to current data
  keys <- unique(c(names(data), names(dataTmp)))
  data <- setNames(mapply(c, data[keys], dataTmp[keys]), keys)
  # call next
  append_new_results(data, res)
}

data <- append_new_results(data, res)

this will take a while you have 472 pages with ~50 results

[1] "calling https://www.climatewatchdata.org/api/v1/data/historical_emissions?historical-emissions-data-sources=215&page=2"
[1] "calling https://www.climatewatchdata.org/api/v1/data/historical_emissions?historical-emissions-data-sources=215&page=3"
[1] "calling https://www.climatewatchdata.org/api/v1/data/historical_emissions?historical-emissions-data-sources=215&page=4"
[1] "calling https://www.climatewatchdata.org/api/v1/data/historical_emissions?historical-emissions-data-sources=215&page=5"
[1] "calling https://www.climatewatchdata.org/api/v1/data/historical_emissions?historical-emissions-data-sources=215&page=6"
[1] "calling https://www.climatewatchdata.org/api/v1/data/historical_emissions?historical-emissions-data-sources=215&page=7"
[1] "calling https://www.climatewatchdata.org/api/v1/data/historical_emissions?historical-emissions-data-sources=215&page=8"
[1] "calling https://www.climatewatchdata.org/api/v1/data/historical_emissions?historical-emissions-data-sources=215&page=9"
[1] "calling https://www.climatewatchdata.org/api/v1/data/historical_emissions?historical-emissions-data-sources=215&page=10"
[1] "calling https://www.climatewatchdata.org/api/v1/data/historical_emissions?historical-emissions-data-sources=215&page=11"
[1] "calling https://www.climatewatchdata.org/api/v1/data/historical_emissions?historical-emissions-data-sources=215&page=12"
[1] "calling https://www.climatewatchdata.org/api/v1/data/historical_emissions?historical-emissions-data-sources=215&page=13"
[1] "calling https://www.climatewatchdata.org/api/v1/data/historical_emissions?historical-emissions-data-sources=215&page=14"
[1] "calling https://www.climatewatchdata.org/api/v1/data/historical_emissions?historical-emissions-data-sources=215&page=15"
[1] "calling https://www.climatewatchdata.org/api/v1/data/historical_emissions?historical-emissions-data-sources=215&page=16"
[1] "calling https://www.climatewatchdata.org/api/v1/data/historical_emissions?historical-emissions-data-sources=215&page=17"
[1] "calling https://www.climatewatchdata.org/api/v1/data/historical_emissions?historical-emissions-data-sources=215&page=18"
[1] "calling https://www.climatewatchdata.org/api/v1/data/historical_emissions?historical-emissions-data-sources=215&page=19"
[1] "calling https://www.climatewatchdata.org/api/v1/data/historical_emissions?historical-emissions-data-sources=215&page=20"
[1] "calling https://www.climatewatchdata.org/api/v1/data/historical_emissions?historical-emissions-data-sources=215&page=21"
[1] "calling https://www.climatewatchdata.org/api/v1/data/historical_emissions?historical-emissions-data-sources=215&page=22"
[1] "calling https://www.climatewatchdata.org/api/v1/data/historical_emissions?historical-emissions-data-sources=215&page=23"
[1] "calling https://www.climatewatchdata.org/api/v1/data/historical_emissions?historical-emissions-data-sources=215&page=24"
[1] "calling https://www.climatewatchdata.org/api/v1/data/historical_emissions?historical-emissions-data-sources=215&page=25"
[1] "calling https://www.climatewatchdata.org/api/v1/data/historical_emissions?historical-emissions-data-sources=215&page=26"
[1] "calling https://www.climatewatchdata.org/api/v1/data/historical_emissions?historical-emissions-data-sources=215&page=27"
...
[1] "calling https://www.climatewatchdata.org/api/v1/data/historical_emissions?historical-emissions-data-sources=215&page=472"

Code can of course be simplified and it can be written better (parallelized etc.) but this should get you going.

Also take care this will be a huge list of lists of lists.

peernisse · July 14, 2024, 2:15am

@vedoa This is brilliant man. I am still wrapping my head around recursive functions. For practice I made a function below that will extract the results of your code output and make it one big dataframe. The input argument page is meant to be the list output from your function append_new_results above. I ran this all and it works. Cheers

# Function to extract the data and create dataframe
# This uses the brilliant API output object from @vedoa
# @param page List The list output from API get function `append_new_results`

page2DF <- function(page) {
    d <- page[['data']]
    
    left_cols <- purrr::map(seq_along(d), ~ list2DF(d[[.x]][1:7])) %>% dplyr::bind_rows()
    right <- purrr::map(seq_along(d), ~ d[[.x]][[8]])
    dfs <- purrr::map(seq_along(right), ~ purrr::list_transpose(right[[.x]], simplify = TRUE, default = NA_real_))
    vals <- purrr::map(seq_along(dfs), ~ list2DF(dfs[[.x]]) %>% tidyr::pivot_wider(names_from = 'year'))
    
    out <- dplyr::bind_cols(left_cols, dplyr::bind_rows(vals))
    return(out)
}

# Convert API Data List to DF

data_df <- page2DF(data)

system · July 21, 2024, 2:16am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.