Hello Posit Community,
I'm excited to share HTDV (Hypothesis Testing for Dependent Variables with Unbalanced Data), a new R package currently available via GitHub. It is designed for applied statisticians, econometricians, and researchers dealing with time-dependent, spatially dependent, or heavily imbalanced datasets where textbook i.i.d. assumptions break down.
What does it do?
Drawing from its mathematical foundations under strong-mixing conditions, HTDV answers a single core inferential question: do these dependent and possibly unequally-sized samples come from a common population?
To do this under the worst combination of nuisance conditions (e.g., temporal dependence with polynomial decay, heavy tails, and severe group imbalance), the package runs three independent inferential layers in parallel:
- A hierarchical Bayesian estimation via Hamiltonian Monte Carlo (using
rstanfor Whittle and composite likelihoods). - A heteroskedasticity-and-autocorrelation-robust (HAR) Wald test with fixed-bandwidth critical values.
- A stationary block bootstrap with automatic block length.
At its core, HTDV relies on a formal metric equivalence theorem between three convergence regimes (Triangular Arrays, Weighted Sums with Correlation, and Mixingale Processes). This allows the framework to dispatch the correct likelihood to your data-generating hypothesis while preserving a unified, mathematically defensible inferential pipeline.
Why is it useful?
Testing dependent data often forces a choice between single methods that can fail to maintain nominal calibration under stress (like high persistence or small samples). By exposing the disagreement between a Berger-robust Bayesian envelope and finite-sample frequentist anchors, HTDV provides a clear calibration signal. Where all three layers concur, your inferential conclusion is robust. Where they disagree, the framework flags the failure mode and points you to the most defensible decision.
Getting Started:
- Install from GitHub:
remotes::install_github("IsadoreNabi/HTDV", build_vignettes = TRUE) - Browse the Code & Issues: GitHub - IsadoreNabi/HTDV
I highly recommend checking out the HTDV-validation vignette, which includes a 1,024-cell pre-registered factorial Monte Carlo simulation and empirical benchmarks that reproduce classic macroeconomic studies (like the FRED-MD CPI inflation and Shiller log-CAPE).
I welcome any feedback, bug reports, or contributions on GitHub. What challenges do you typically face when testing hypotheses on highly persistent or unbalanced series?
Best,
José Mauricio Gómez Julián