The topologyR package allows you to build and analyze topological structures from numerical data (e.g., time series). The central idea is to use these structures (topological subbases and bases) to study the connectivity between observations under a proximity threshold. In other words, it determines whether the data form a connected set (all observations are interconnected) or whether they are divided into separate components. This result guides a practical decision rule: if the resulting topology is connected, it makes sense to employ globally continuous methods (splines, kriging, etc.); if it is disconnected, it is advisable to process each component separately (modeling or segmented imputation).
The main functionality of topologyR includes:
-
Constructing topological subbases and bases from the data, using tolerance thresholds to define neighborhoods.
-
Deriving the complete topology and inspecting its connectivity with graph searches (directed and undirected graphs).
-
Suggest and explore robust thresholds: mean/median distances between adjacent points, factor-based IQR, or DBSCAN-type heuristics.
-
Clear decision rule: connected topology → global continuous methods; disconnected topology → imputation/segment modeling.
For example, the is_topology_connected function indicates whether the topology is connected (TRUE) or not (FALSE):
library(topologyR)
x <- c(1, 2, 3, 10, 11, 12)
topo <- complete_topology(x)
is_topology_connected(topo$topology) # FALSE (segmented data)
y <- 1:10
topo2 <- complete_topology(y)
is_topology_connected(topo2$topology) # TRUE (connected set)
In the first case, the output is FALSE (the topology is fragmented), while with the continuous vector 1:10, the function returns TRUE (all connected). These results illustrate how topologyR helps decide whether to treat the data as a continuous whole or as independent segments.
The topologyR package is available on GitHub: IsadoreNabi/topologyR. It's easily installed with:
remotes::install_github("IsadoreNabi/topologyR")
You'll find more details and examples in my Philosophy of Statistics blog posts (English and Spanish versions). There I explain the theoretical motivation and use cases in econometrics, neuroscience, and climatology. I hope you'll try it and let me know what you think!