When a new pathogen variant emerges, public health teams need to estimate its growth advantage and forecast when it will dominate. The lineagefreq R package provides a reproducible workflow for these tasks using genomic surveillance count data.
Installation
install.packages("lineagefreq")
Working with Real CDC Data
The package ships with real CDC surveillance data. Here is an analysis of the JN.1 emergence in late 2023:
library(lineagefreq)
data(cdc_sarscov2_jn1)
x <- lfq_data(cdc_sarscov2_jn1, lineage = lineage, date = date, count = count)
Fitting a Model
fit <- fit_model(x, engine = "mlr")
This fits a multinomial logistic regression: each lineage gets an intercept and a growth rate, estimated by maximum likelihood.
Growth Advantages
growth_advantage(fit)
growth_advantage(fit, type = "relative_Rt", generation_time = 5)
growth_advantage(fit, type = "doubling_time")
A relative Rt of 1.3 means 30% more transmission per generation.
Forecasting with Uncertainty
fc <- forecast(fit, horizon = 28, n_sim = 1000)
autoplot(fc)
Honest Forecast Evaluation
bt <- backtest(x, engines = c("mlr", "piantham"),
horizons = c(7, 14, 21, 28), min_train = 42)
sc <- score_forecasts(bt, metrics = c("mae", "coverage", "wis"))
compare_models(sc)
Rolling-origin backtesting avoids the common mistake of reporting in-sample fit as forecast accuracy.
Key Features
- Five engines — MLR, hierarchical MLR, Piantham Rt conversion, and two Bayesian engines via Stan
- Built-in backtesting — rolling-origin out-of-sample evaluation
- Real data included — two CDC SARS-CoV-2 datasets for immediate validation
- Broom integration —
tidy(),glance(),augment()work as expected - Surveillance tools —
sequencing_power()andsummarize_emerging()for programme planning
What It Is Not
It is not a replacement for specialised phylodynamic tools (BEAST, Nextstrain's evofr). It is a lighter-weight, CRAN-distributed alternative for teams that need reproducible frequency analysis without setting up Stan infrastructure (though Bayesian engines are available if cmdstanr is installed).
Links
- CRAN: CRAN: Package lineagefreq
- GitHub: GitHub - CuiweiG/lineagefreq: R package for pathogen lineage frequency dynamics, growth-advantage estimation, and short-term forecasting from genomic surveillance counts · GitHub
Happy to discuss methodology or answer questions.