Curbcut

David_Wachsmuth · September 13, 2024, 7:39pm

Authors: David Wachsmuth, Maxime Bélanger de Blois

Abstract: Curbcut is a platform for interactive geovisualization and analysis of urban sustainability, built on four mutually supportive components: 1) a diverse and constantly growing range of datasets which capture urban sustainability relationships in a variety of spatial and temporal patterns; 2) a multi-scalar data architecture which permits extremely performant data manipulation, comparison and visualization; 3) a set of composable and extensible methods for data analysis and visualization which allow individual variables or bivariate combinations of variables to be dynamically and flexibly represented; and 4) a set of pages for performantly displaying curated vertical slices of data in a simultaneously deep and intuitive fashion.

Full Description:

Overview

Can we gather a huge range of data sources about sustainability in an urban region, and facilitate deep, dynamic and intuitive geospatial explorations and analyses? The answer to this question is “yes”—Curbcut. Curbcut is a public-facing, high-performance Shiny app which offers users an incredibly dynamic and responsive window into pressing sustainability issues.

After an effervescence of interest in geovisualization dashboards in the 2010s, we argue that several key tensions remain unresolved. The first is the tension between depth and breadth. Web technology can allow dozens—or hundreds—of spatial variables or layers to be presented in an interactive map, and this is common in municipal GIS tools. But presenting a large number of layers usually results in a “lowest common denominator” approach, where the distinctive characteristics of each layer are effaced. The second tension is between flexibility and intuitiveness. In theory, the different layers in an interactive map could contain important spatial or non-spatial patterns either on their own or in relation to each other. But exposing an interface to allow flexible explorations of these patterns is usually at odds with an interface which is intuitive to non-expert users. The third tension is between quantitative and qualitative data. Incorporating qualitative data into tools which tend to be dominated by quantitative data is increasingly recognized as necessary to bridge the gap between experts and decision-makers. But it is not straightforward to incorporate this type of data into a platform which is designed overwhelmingly around the assumptions of quantitative data. Curbcut is a platform which tries to take each of these “either/or” tensions and turn them into “both/and”: depth and breadth, flexibility and intuitiveness, quantitative and qualitative data.

The platform

The platform is built on four mutually supportive components: 1) a diverse and constantly growing range of datasets which capture urban sustainability relationships in a variety of spatial and temporal patterns; 2) a multi-scalar data architecture which permits extremely performant data manipulation, comparison and visualization; 3) a set of composable and extensible methods for data analysis and visualization which allow individual variables or bivariate combinations of variables to be dynamically and flexibly represented; and 4) a set of pages for performantly displaying curated vertical slices of data in a simultaneously deep and intuitive fashion.

Data: The ambition of Curbcut is to allow any dataset which speaks to urban sustainability issues in one of our target geographies to be integrated into the platform. While this ambition is not yet fully realized, the platform currently includes datasets featuring point, line, polygon, and network geometries in both point-in-time and longitudinal temporalities, images and videos, and text.

Multi-scalar data architecture: A multi-scalar data architecture permits extremely performant data manipulation, com- parison and visualization. A Curbcut instance defines a set of “scales” (groups of spatial features which are usually mutually exclusive and collectively exhaustive) and “regions” (overall spatial extents, within which some set of scales are hierarchically organized). These scales and regions are stored as vector tiles. Incoming data is added or interpolated to some or all of these scales and regions on import, and then stored as a set of non-spatial tables in an SQL database. Browser-side JavaScript queries and computes on these tables to performantly update vector tiles. A standard limitation of vector-tile-based geovisualization is that each tile can only contain a relatively small amount of data, since tiles need to be dynamically fetched in response to user interaction with a map. (If the tiles contain too much data, they will load slowly enough that the map is not usable.) This means that vector tile sets tend to only be capable of displaying a small number of different variables, and rely on computationally expensive runtime processing of those variables to achieve more complex results. The Curbcut platform separates spatial and non-spatial data in its processing pipeline, so that vector tiles only contain a set of unique IDs, while other variables are stored in SQL tables whose values can be linked at runtime to the IDs in the vector tiles. This means that our system can scale to presenting literally tens of thousands of variables for each tile, which to our knowledge is unique or at least extremely rare in the interactive geospatial analysis domain.

Composable and extensible methods: A set of composable and extensible methods for data analysis and visualization implemented using the S3 framework in R allows individual variables or bivariate combinations of variables to be presented and manipulated without any manual intervention. These methods dispatch off of the attributes of one or two variables to be analyzed in the manner of an object-oriented programming system, and define rules for cartography (e.g. a single continuous polygon variable should be visualized with a binned choropleth map), graphing (e.g. a pair of continuous variables should be compared with a scatterplot, while a continuous variable and a categorical variable should be compared with a boxplot), and textual analysis. When new data types are added to Curbcut, new methods can be defined so that these data types can be gracefully integrated into the platform.

This methods-based architecture for data analysis allows the Curbcut analytical platform to be far more flexible and dynamic than solutions which solve specific problems in a narrow fashion, but also far more detailed and expressive than “lowest common denominator” solutions which simply present all types of data as equivalent. The Curbcut class/methods system addresses a common set of challenges for geospatial analysis and solves them at scale. There is a large domain of analyses which are achievable for a trained geospatial technician or researcher but generally rely on specific, one-off data manipulations and processing tasks. For example, comparing the change in a variable between multiple census years requires reconciling changing census boundaries from year to year, usually with some form of spatial interpolation. As another example, visualizing a count variable on a choropleth map requires normalizing data, while visualizing a proportion variable does not. We have worked to identify higher- order rules for carrying out these common analytical tasks—rules which are transposable from one problem set to another. The result is that the platform has both the flexibility of an extremely generic system and the depth of a highly customized system.

Pages: In principle there are a practically infinite set of ways that different sustainability variables could be collected and compared. Out of this large possibility space, we build a much narrower set of pages which performantly display curated vertical slices of data in a simultaneously deep and intuitive fashion. The default page on the Curbcut platform presents a “lefthand” variable on a map and/or table with associated explanatory text and graphical analysis, and then presents a range of “righthand” variables against which the lefthand variable can be compared. By default the spatial scale of the data updates as the user zooms in and out on the map, although zoom level can be manually controlled. If the variable has time series data available, the user will be able to scrub through time periods and orchestrate point-to-point time comparisons. When the user selects a spatial feature on the map, the associated explanatory text and visualizations are updated. The user can switch back and forth between map and table views of the same data, and state is preserved. Finally, a set of “did you know” factoids are randomly generated based on the map state, which provide opportunities for users to discover related variable combinations or pages elsewhere on the platform. We have created a wide range of curated analytical modules corresponding to different key sustainability problems and issues—grouped under the headings of “Cli- mate”, “Demographics”, “Ecology”, “Economy”, “Health”, “Housing”, “Land use”, “Resources”, “Transport”, and “Urban life”. But an inherent strength of the Curbcut platform is that new pages can easily be created to address new analytical tasks, since the pages are built out of composable data analysis methods. Each instance of the Curbcut web app (i.e. each individual city’s portal) is a self- contained package built from these four components—datasets, multi-spatial data architecture, composable and extensible methods, and pages—with common shared infrastructure but different content.

System architecture

Curbcut is powered by three R software packages: {cc.builder}, {cc.data}, and {curbcut}. {cc.builder} supplies the functionality necessary to build a Curbcut instance. Its functions create a repository with the necessary directory and file structure, import datasets by interfacing with {cc.data} and possibly other custom data sources, create the SQLite database to host datasets internally, build vector tilesets and deploy them to private Mapbox servers for mapping, and generate the scripts which allow for building the Curbcut Docker image and deploying it to AWS.

{cc.data} allows controlled access to pre-processed national datasets which are hosted on private AWS buckets. We generate internal API tokens which allow access to some subset of these datasets. The primary means of interfacing with the datasets is during the development process through {cc.builder}.

{curbcut} hosts the runtime functions which are used by a given Curbcut web app. For example, the function curbcut::render_explore_graph() generates the dynamic graphs which are found in the “Explore” panel of most Curbcut pages. As we add new functionality to this function (or fix bugs in it), all Curbcut instances inherit these improvements.

A given Curbcut instance is an R Shiny app which lives in a Git repository. The typical development process for a new Curbcut instance involves using {cc.builder} to initialize the repository and generate the basic code templates used for the individual pages of the app, and then writing code manually to add additional functionality and customization for an instance.

Shiny app: https://montreal.curbcut.ca
Repo: GitHub - Curbcut/curbcut-montreal: Sustainability App for the region of Greater Montreal displaying the work and findings of Mcgill MSSI researchers in the form of interactive dashboards.

Thumbnail: