I would like to get the url from the main Economic contribution of Tourism and beyond: Data on the economic contribution of Tourism, then using session function to get link from children webpages (Tourism GDP &Employment Persons)
url <- "https://www.unwto.org//tourism-statistics/economic-contribution-SDG"
url %>%
rvest::session() %>%
rvest::html_elements(css = "div.field--field-card-paragraphs a") %>%
xml2::xml_attr("href")
keithn
2
Could try this
library(rvest)
url <- "https://www.unwto.org//tourism-statistics/economic-contribution-SDG"
css_selector <- "div.field--field-card-paragraphs a"
unwto_session <- rvest::session(url)
unwto_session %>%
rvest::html_elements(css = css_selector) %>%
xml2::xml_attr("href") %>%
purrr::map_if(~!grepl(pattern = "\\.xlsx$", x = .x),
~ unwto_session %>%
rvest::session_jump_to(.x) %>%
rvest::html_elements(css = css_selector) %>%
xml2::xml_attr("href"),
.else = ~ .x) %>%
purrr::list_c()
#> [1] "http://pre-webunwto.s3.eu-west-1.amazonaws.com/s3fs-public/2025-06/UN_Tourism_12_b_1_TSA_SEEA_04_2025.xlsx"
#> [2] "https://pre-webunwto.s3.eu-west-1.amazonaws.com/s3fs-public/2025-06/UN_Tourism_8_9_1_TDGDP_04_2025.xlsx"
#> [3] "https://pre-webunwto.s3.eu-west-1.amazonaws.com/s3fs-public/2025-06/UN_Tourism_8_9_2_employed_persons_04_2025.xlsx"
Created on 2025-07-07 with reprex v2.1.1