| Title: | Evidence-Based Bayesian Disaggregation of Aggregate Indices |
| Version: | 0.2.1 |
| Depends: | R (≥ 4.1.0) |
| Description: | Disaggregates an observed aggregate price index into sectoral components with a Bayesian state-space model in which the aggregate enters as a genuine observation density rather than as a renormalization identity. A random-walk-with-drift transition in log space (with partial pooling on the drift and the innovation scale) and an estimable cross-sectional concentration produce posterior draws of the sectoral indices with credible intervals, suitable as multiple-imputation input for downstream dynamic models. The Hamiltonian Monte Carlo engine follows Stan (Carpenter et al., 2017) <doi:10.18637/jss.v076.i01>; model comparison uses Pareto Smoothed Importance Sampling Leave-One-Out cross-validation (Vehtari, Gelman and Gabry, 2017) <doi:10.1007/s11222-016-9696-4>. A closed-form linear-Gaussian Kalman/RTS smoother provides an exact, MCMC-free Bayesian alternative for the same aggregate evidence. |
| License: | MIT + file LICENSE |
| Encoding: | UTF-8 |
| Imports: | readxl, dplyr, tidyr, stringr, magrittr, stats, parallel |
| Suggests: | cmdstanr, rstan (≥ 2.21.0), posterior, loo (≥ 2.5.0), knitr, rmarkdown, ggplot2, readr, testthat (≥ 3.0.0) |
| Additional_repositories: | https://mc-stan.org/r-packages/ |
| VignetteBuilder: | knitr |
| NeedsCompilation: | no |
| Author: | José Mauricio Gómez Julián
|
| Maintainer: | José Mauricio Gómez Julián <isadore.nabi@pm.me> |
| Config/roxygen2/version: | 8.0.0 |
| Packaged: | 2026-06-18 19:27:15 UTC; josemgomezj |
| Repository: | CRAN |
| Date/Publication: | 2026-06-18 22:10:07 UTC |
Align CPI and VAB weights on their common years
Description
Reads the CPI and the weights matrix, intersects their years, and returns the aligned aggregate vector and weight matrix ready for the disaggregation engines. Both engines (state-space and conjugate) consume this output.
Usage
align_disagg_inputs(path_cpi, path_weights)
Arguments
path_cpi |
Path to the CPI Excel file (see |
path_weights |
Path to the VAB-weights Excel file
(see |
Value
A list with cpi (numeric, length T), W
(T \times K, rows sum to 1), years, industries.
See Also
Default prior scales for the state-space disaggregation
Description
Weakly-informative scales on the raw index level, derived from the observed
aggregate so the model is location/scale aware. Override any field by passing
a (partial) named list to disaggregate_statespace(priors = ...).
Usage
disagg_default_priors(cpi)
Arguments
cpi |
Numeric vector; the observed aggregate index (level). |
Value
Named list of prior scales (see disaggregate_statespace).
Stan code for the canonical disaggregation model
Description
Returns the complete Stan code of the evidence-based state-space model read from the canonical file (single source of truth; no embedded duplicate).
Usage
disagg_stan_code()
Value
Character string with the Stan model code.
Examples
code <- disagg_stan_code()
cat(substr(code, 1, 200))
Conjugate (closed-form) disaggregation baseline
Description
Exact linear-Gaussian state-space posterior (Kalman/RTS smoother) for the
sectoral price-index levels \varphi_{t,k} given the aggregate index and
the VAB weights. Optionally returns joint posterior draws via the
Durbin-Koopman simulation smoother (so the draws can also feed
bayesianOU::fit_ou_nested_mi), and a pointwise Gaussian log-likelihood.
Usage
disaggregate_conjugate(
cpi,
W,
years = NULL,
industries = NULL,
q_frac = 0.1,
r_frac = 0.05,
p0_frac = 0.3,
n_draws = 0L,
seed = 1234L
)
Arguments
cpi |
Numeric vector (length |
W |
Numeric matrix ( |
years, industries |
Optional period and sector labels. |
q_frac |
Random-walk innovation sd as a fraction of |
r_frac |
Observation sd as a fraction of |
p0_frac |
Initial cross-sectional sd as a fraction of |
n_draws |
Integer; number of joint posterior draws (simulation smoother).
|
seed |
Integer RNG seed (used only when |
Value
An object of class "disagg_conjugate": a list with
phi_summary (median = smoothed mean, q2.5/q97.5 from the
marginal Gaussian), agg_summary, loglik (total Gaussian
log-likelihood of cpi), phi_draws ([T, K, n_draws] or
NULL), years, industries, config.
See Also
disaggregate_statespace (canonical engine).
Examples
sim <- simulate_disagg(T = 25, K = 4, seed = 7)
bl <- disaggregate_conjugate(sim$cpi, sim$W)
dim(bl$phi_summary$median)
Evidence-based disaggregation directly from Excel files
Description
Thin convenience wrapper: reads and aligns the CPI and VAB-weight files
(align_disagg_inputs) and runs the canonical state-space engine
(disaggregate_statespace).
Usage
disaggregate_from_files(path_cpi, path_weights, ...)
Arguments
path_cpi |
Path to the CPI Excel file (index levels, re-indexed to the
same base as the production prices; see the package vignette and the data
note on |
path_weights |
Path to the VAB-weights Excel file. |
... |
Passed to |
Value
A "disagg_statespace" object.
See Also
disaggregate_statespace, align_disagg_inputs
Examples
## Not run:
cpi_file <- system.file("extdata", "CPI.xlsx", package = "BayesianDisaggregation")
w_file <- system.file("extdata", "WEIGHTS.xlsx", package = "BayesianDisaggregation")
fit <- disaggregate_from_files(cpi_file, w_file, chains = 2, iter = 800)
## End(Not run)
Evidence-based Bayesian disaggregation (state-space; canonical engine)
Description
Disaggregates an observed aggregate index (CPI) into K latent sectoral
price indices \varphi_{t,k} with a Bayesian state-space model in which
the aggregate enters as a genuine observation density (not a renormalization
identity). The model couples a random-walk-with-drift transition in
\log\varphi (partial pooling on the drift and the innovation scale), an
estimable cross-sectional concentration, and a Student-t (or Gaussian)
observation cpi_t \mid \varphi \sim \mathrm{Student\text{-}t}(\nu,
\sum_k W_{t,k}\varphi_{t,k}, \sigma). See vignette("evidence-based-disaggregation").
Usage
disaggregate_statespace(
cpi,
W,
years = NULL,
industries = NULL,
student_obs = TRUE,
priors = NULL,
chains = 4L,
iter = 2000L,
warmup = 1000L,
thin = 1L,
cores = NULL,
adapt_delta = 0.95,
max_treedepth = 12L,
seed = 1234L,
init = 0.5,
keep_fit = TRUE,
verbose = FALSE
)
Arguments
cpi |
Numeric vector (length |
W |
Numeric matrix ( |
years |
Optional integer vector (length |
industries |
Optional character vector (length |
student_obs |
Logical; if |
priors |
Optional named list overriding |
chains, iter, warmup, thin |
Sampler controls (HMC/NUTS). Defaults
|
cores |
Integer; parallel chains. Default |
adapt_delta, max_treedepth |
NUTS tuning. Defaults |
seed |
Integer RNG seed. Default |
init |
Sampler init; a numeric scalar is an init radius (cmdstanr) or is
translated to |
keep_fit |
Logical; keep the raw Stan fit object in the result. Default
|
verbose |
Logical; print progress. Default |
Details
The returned posterior draws of \varphi (a [T, K, draws] array)
are exactly the multiple-imputation input consumed by
bayesianOU::fit_ou_nested_mi(), propagating the disaggregation
uncertainty into the downstream nested-OU analysis (Rubin's rule).
Value
An object of class "disagg_statespace": a list with
- phi_draws
[T, K, draws]numeric array of posterior draws of\varphi(the multiple-imputation input for the nested OU).- phi_summary
List of
T \times Kmatricesmedian,q2.5,q97.5(credible bands per sector and period).- agg_summary
T \times 3matrix: posterior median and 95% band of the fitted aggregate\sum_k W\varphi(against whichcpiis the evidence).- years, industries
Period and sector labels.
- diagnostics
rhat_max,divergences.- stan_fit
The Stan fit (if
keep_fit).- config
Sampler/prior configuration and
T,K.
See Also
disaggregate_conjugate (closed-form Bayesian baseline),
disaggregate_from_files, simulate_disagg.
Examples
## Not run:
set.seed(1)
sim <- simulate_disagg(T = 30, K = 4)
fit <- disaggregate_statespace(sim$cpi, sim$W, chains = 2, iter = 800)
dim(fit$phi_draws) # T x K x draws
## End(Not run)
Enable logging at a specific level
Description
Sets the package-wide logging verbosity.
Usage
log_enable(level = "INFO")
Arguments
level |
Character scalar. One of "TRACE", "DEBUG", "INFO", "WARN", "ERROR". |
Value
(Invisibly) the level set.
Log message with timestamp
Description
Internal helper that prints a timestamped message when the current log
level is at least level.
Usage
log_msg(level = "INFO", ...)
Arguments
level |
Character level: "TRACE","DEBUG","INFO","WARN","ERROR". |
... |
Message components (will be concatenated with spaces). |
Read CPI data from an Excel file
Description
Loads and normalizes a CPI time series from an Excel worksheet. The function
detects the date/year column and the CPI/value column by pattern-matching on
lower-cased header names, parses localized numerics (via to_num_commas()),
collapses duplicate years by averaging, and returns a clean, sorted data frame.
Usage
read_cpi(path_cpi)
Arguments
path_cpi |
Character path to the CPI Excel file. |
Details
Column detection. Headers are lower-cased and matched with:
Date/year: patterns
"date|fecha|year|anio|ano".CPI/value: patterns
"cpi|indice|price".
If either column cannot be identified, the function errors.
Cleaning.
Year is extracted as the first 4 digits of the date-like column.
CPI is parsed with
to_num_commas()(handles commas/thousands).-
NArows are dropped; duplicates inYearare averaged. Output is sorted by
Yearascending.
Value
A data.frame with two columns:
-
Year(integer) -
CPI(numeric)
See Also
read_weights_matrix, align_disagg_inputs
Examples
cpi_file <- system.file("extdata", "CPI.xlsx", package = "BayesianDisaggregation")
if (nzchar(cpi_file)) {
df <- read_cpi(cpi_file)
head(df)
}
Read a weights matrix from an Excel file
Description
Loads a sector-by-year weight table, normalizes weights to the simplex per year,
and returns a list with the T \times K prior matrix P, the sector
names, and the year vector. The first column is assumed to contain sector names
(renamed to Industry); all other columns are treated as years.
Usage
read_weights_matrix(path_weights)
Arguments
path_weights |
Character path to the weights Excel file. |
Details
Expected layout. One sheet with:
First column: sector names (any header; renamed to
Industry).Remaining columns: years; the function extracts a 4-digit year from each header using
stringr::str_extract(Year, "\\d{4}").
Values are parsed with to_num_commas(), missing rows are dropped, and
weights are normalized within each year to sum to 1. Any absent (sector, year)
entry becomes 0 when pivoting wide. Finally, rows are re-normalized with
row_norm1() for numerical safety.
Safeguards.
Rows with all-missing/zero after parsing are dropped by the filters.
If no valid year columns are found, the function errors.
Value
A list with:
PT \times Knumeric matrix of prior weights (rows sum to 1).industriesCharacter vector of sector names (length
K).yearsInteger vector of years (length
T).
See Also
Examples
w_file <- system.file("extdata", "WEIGHTS.xlsx", package = "BayesianDisaggregation")
if (nzchar(w_file)) {
w <- read_weights_matrix(w_file)
stopifnot(is.matrix(w$P), all(abs(rowSums(w$P) - 1) < 1e-8))
str(w)
}
Simulate from the state-space disaggregation DGP
Description
Generates a synthetic aggregate index cpi, the (known) VAB weights
W, and the latent sectoral price-index paths phi_true from the
same data-generating process as disaggregate_statespace. The
innovation scale is kept modest so the log random walk stays in a numerically
stable region over the simulated horizon (the same care taken in the sibling
OU simulator).
Usage
simulate_disagg(
T = 40L,
K = 5L,
phi1_center = 100,
omega_struct = 0.3,
delta_mu = 0.02,
delta_sigma = 0.01,
tau_mu = 0.04,
tau_sigma = 0.3,
sigma_cpi = 1,
nu = Inf,
seed = 1234L
)
Arguments
T |
Integer; number of periods. |
K |
Integer; number of sectors. |
phi1_center |
Numeric; central level of the initial cross-section. |
omega_struct |
Numeric; cross-sectional log-level dispersion at t = 1. |
delta_mu, delta_sigma |
Common drift and its cross-sector dispersion. |
tau_mu, tau_sigma |
Geometric mean innovation scale and log-dispersion
(so |
sigma_cpi |
Observation noise scale on the aggregate. |
nu |
Student-t degrees of freedom of the observation ( |
seed |
Integer RNG seed. |
Value
A list with cpi (length T), W (T \times K,
rows sum to 1), phi_true (T \times K), agg_true
(length T), and params (the true scalar/vector parameters).
Examples
sim <- simulate_disagg(T = 20, K = 3, seed = 42)
str(sim$params)