Can a deployed Shiny app on shinyapps.io fetch an updated CSV from GitHub without republishing?

I have a Shiny app deployed to shinyapps.io that reads a large (~30 MB) CSV file hosted on GitHub (public repo).

  • In development, I can use reactivePoll() with a HEAD request to check the Last-Modified header and download the file only when it changes.
  • This works locally: the file updates automatically while the app is running.

However, after deploying to shinyapps.io, the app only ever uses the file that existed at deploy time. Even though the GitHub file changes, the deployed app doesn’t pull the update unless I redeploy the app.

Question:

  • Is shinyapps.io capable of fetching a fresh copy of the file from GitHub at runtime, or does the server’s container isolate the app so it can’t update external data unless redeployed?
  • If runtime fetching is possible, are there special settings or patterns I should use so the app refreshes the data from GitHub without redeploying?

My goal is to have a live map of data that doesn't require the user to refresh or reload when new data is available.

Here's what I'm trying:

.cache <- NULL
.last_mod_seen <- NULL

data_raw <- reactivePoll(
intervalMillis = 60 * 1000, # check every 60s
session = session,

# checkFunc: HEAD to read Last-Modified
checkFunc = function() {
  res <- tryCatch(
    HEAD(merged_url, timeout(5)),
    error = function(e) NULL
  )
  if (is.null(res) || status_code(res) >= 400) {
    # On failure, return previous value so we DON'T trigger a download
    return(.last_mod_seen)
  }
  lm <- headers(res)[["last-modified"]]
  if (is.null(lm)) {
    # If header missing (rare), fall back to previous to avoid spurious fetches
    return(.last_mod_seen)
  }
  .last_mod_seen <<- lm
  lm
},

# valueFunc: only called when Last-Modified changes
valueFunc = function() {
  message("Downloading updated merged.csv from GitHub...")
  df <- tryCatch(
    readr::read_csv(merged_url, col_types = expected_cols, na = "null", show_col_types = FALSE),
    error = function(e) {
      if (!is.null(.cache)) return(.cache)
      stop(e)
    }
  )
  .cache <<- df
  df
}

)

Bump....

Why is there no information about reactive polling on posit's servers?

I'm looking for, and not seeing, the code that sets the value for "merged_url". Does it point to a file source outside the shinyapps domain?

Correct, it's hosted on GitHub: https://raw.githubusercontent.com/tdecapctsv/aqdata/main/merged.csv

RAW <- "https://raw.githubusercontent.com/tdecapctsv/aqdata/main"
meta_url <- file.path(RAW, "historic_sensors_metadata.csv")
merged_url <- file.path(RAW, "merged.csv")

I've also tried ditching the check for new files, since I am the one updating the file, I know when its a new. I tried using reactiveTimer to grab the file at minute: 15 of each hour; but when it does, the app disconnects from the server and has to be reloaded in order to get the new file.

Locally, it works perfectly and silently updates the data without breaking the app.

It might be easier to diagnose if you can strip out anything not essential to demonstrating the problem and post a complete reprex here.

Appreciate your help with this. Here's a stripped down version that reads the two files and plots the latest data values on a map. It runs locally and you can watch the timestamp under the map at around 15 minutes after the hour to observe the updated file. Once deployed to Shinyapps.io, instead of updating the timestamp, the app will disconnect from the server.

# Minimal reproducible app: read GitHub CSVs, map, refresh at :15 each hour
library(shiny)
library(leaflet)
library(readr)
library(dplyr)
library(lubridate)

# --- GitHub raw URLs (public) ---
RAW <- "https://raw.githubusercontent.com/tdecapctsv/aqdata/main"
meta_url   <- file.path(RAW, "historic_sensors_metadata.csv")
merged_url <- file.path(RAW, "merged.csv")

# Columns we actually need from merged.csv
expected_cols <- cols_only(
  time_stamp       = col_double(),
  sensor_index     = col_character(),
  pm_cor           = col_double()
)

# Helper: fetch & prepare the latest hour joined with metadata
fetch_latest <- function() {
  # Read metadata (kept small; adjust if needed)
  meta <- read_csv(meta_url, show_col_types = FALSE) %>%
    mutate(sensor_index = as.character(sensor_index)) %>%
    select(sensor_index, latitude, longitude, Friendly_Name = Friendly_Name, pollutants, sensor_man)
  
  # Read merged (large)
  df <- read_csv(merged_url, col_types = expected_cols, na = "null", show_col_types = FALSE) %>%
    mutate(time_stamp = as_datetime(time_stamp, tz = "UTC"))
  
  # Use most recent complete hour (as in your app logic)
  latest_time <- floor_date(max(df$time_stamp, na.rm = TRUE) - hours(1), unit = "hour")
  
  df %>%
    filter(time_stamp == latest_time) %>%
    group_by(sensor_index) %>%
    slice_tail(n = 1) %>%
    ungroup() %>%
    left_join(meta, by = "sensor_index") %>%
    filter(!is.na(latitude), !is.na(longitude))
}

ui <- fluidPage(
  titlePanel("Minimal GitHub-backed Sensor Map"),
  fluidRow(
    column(12,
           tags$p("Auto-refresh at/after :15 each hour (America/Chicago)."),
           leafletOutput("map", height = 600),
           tags$div(style = "margin-top:8px;",
                    textOutput("status", inline = TRUE)
           )
    )
  )
)

server <- function(input, output, session) {
  # Reactive storage
  rv <- reactiveValues(df = NULL, last_hour_loaded = as.POSIXct(NA), last_status = "Starting…")
  
  # Central Time hour bucket
  ct_hour <- function(t = Sys.time()) floor_date(with_tz(t, "America/Chicago"), unit = "hour")
  ct_min  <- function(t = Sys.time()) minute(with_tz(t, "America/Chicago"))
  
  # Initial load (synchronous for simplicity in this minimal example)
  observeEvent(TRUE, {
    rv$last_status <- "Loading initial data…"
    df <- try(fetch_latest(), silent = TRUE)
    if (inherits(df, "try-error") || !nrow(df)) {
      rv$last_status <- "Initial load failed."
    } else {
      rv$df <- df
      rv$last_hour_loaded <- ct_hour()
      rv$last_status <- paste0("Loaded hour ", format(rv$last_hour_loaded, "%Y-%m-%d %H:00 %Z"))
    }
  }, once = TRUE, ignoreInit = FALSE)
  
  # Timer: tick every minute; refresh once per hour at/after :15
  tick <- reactiveTimer(60 * 1000, session = session)
  observe({
    tick()
    # Conditions: after minute 15 and not yet loaded this hour
    this_hr <- ct_hour()
    if (ct_min() >= 15 && (is.na(rv$last_hour_loaded) || rv$last_hour_loaded < this_hr)) {
      rv$last_status <- "Refreshing data…"
      df <- try(fetch_latest(), silent = TRUE)
      if (!inherits(df, "try-error") && nrow(df)) {
        rv$df <- df
        rv$last_hour_loaded <- this_hr
        rv$last_status <- paste0("Refreshed hour ", format(this_hr, "%Y-%m-%d %H:00 %Z"))
      } else {
        rv$last_status <- "Refresh failed (keeping previous data)."
      }
    }
  })
  
  # Map
  output$map <- renderLeaflet({
    req(rv$df)
    df <- rv$df
    
    # Simple color binning by pm_cor (tweak bins/palette as you like)
    bins <- c(0, 12, 35, 55, 150, 250, Inf)
    pal  <- colorBin(c("green","yellow","orange","red","purple","maroon"), bins, na.color = "gray")
    
    leaflet(df) %>%
      addTiles() %>%
      addCircleMarkers(
        lng = ~longitude, lat = ~latitude,
        label = ~paste0("Sensor ", sensor_index, "<br>PM (pm_cor): ", round(pm_cor, 1)),
        color = ~pal(pm_cor), fillColor = ~pal(pm_cor), fillOpacity = 0.9,
        radius = 8, stroke = TRUE, weight = 1
      ) %>%
      addLegend("bottomright", pal = pal, values = ~pm_cor, title = "PM (pm_cor)", opacity = 1)
  })
  
  # Status line
  output$status <- renderText({
    paste("Status:", rv$last_status)
  })
}

shinyApp(ui, server)

Have you looked at the application log on shinyapps.io to see if there are any error messages that might explain the disconnection?

Also, you might try changing the try() function (where you call fetch_latest()) to tryCatch() and use the error trapping part to display a message in the UI and then invoke req(FALSE) (to prevent any further processing that might trigger the disconnection).