KL-Optimality: designs for model discrimination

library(optedr)

Background

Standard optimality criteria (D, A, I, …) assume the model is correctly specified and aim to estimate its parameters as precisely as possible. KL-Optimality addresses a different question: given two competing models, where should observations be taken to tell them apart most easily?

The criterion is based on the Kullback-Leibler divergence between the reference model \(f_1\) and the best-fitting version of the rival \(f_2\):

\[\Phi_{\text{KL}}(\xi) = \int \min_{\beta_2} \text{KL}(f_1(y \mid x) \,\|\, f_2(y \mid x, \beta_2))\, \xi(dx).\]

The design \(\xi^*\) that maximises \(\Phi_{\text{KL}}\) places observations where the two models are most distinguishable on average.

Reference: López-Fidalgo, Tommasi & Trandafir (2007). An optimal experimental design criterion for discriminating between non-normal models. J. R. Stat. Soc. B, 69(2), 231–242.

API overview

opt_des(
  "KL-Optimality",
  model        = <reference model formula>,
  parameters   = <reference parameter names>,
  par_values   = <nominal values>,
  design_space = <scalar interval or named list>,
  # rival specification:
  rival_model  = <rival formula>,       # default: same as model
  rival_params = <rival param names>,   # default: same as parameters
  rival_pars   = <starting values for internal optimisation>,
  rival_lower  = <lower bounds for rival params>,
  rival_upper  = <upper bounds for rival params>,
  # GLM family:
  family       = "Normal",   # "Poisson", "Binomial", "Gamma"
  phi          = 1,          # dispersion parameter
  # alternative: supply a custom KL function
  kl_fun       = NULL
)

The internal optimisation over rival parameters is done automatically. The optimal rival values are stored as a hidden attribute and shown by summary().

Example 1: Michaelis-Menten vs linear rival (Normal, 1D)

Context. Enzyme kinetics with one substrate. The reference model is Michaelis-Menten \(\mu(x) = V_{\max} x / (K_m + x)\) with \(V_{\max} = 2\), \(K_m = 1\). The rival is a linear rate \(\mu(x) = a x\), valid only when \(x \ll K_m\). At high concentrations MM saturates while the linear model keeps growing — that is where the models diverge most.

result_kl_mm <- opt_des(
  "KL-Optimality",
  model        = y ~ Vmax * x / (Km + x),
  parameters   = c("Vmax", "Km"),
  par_values   = c(2, 1),
  design_space = c(0.1, 5),
  rival_model  = y ~ a * x,
  rival_params = c("a"),
  rival_pars   = c(1),
  rival_lower  = c(0.2),
  rival_upper  = c(2.5),
  family       = "Normal",
  phi          = 1
)
#> 
#> ℹ Stop condition reached: difference between sensitivity and criterion < 1e-05
#> 
⠙ Calculating optimal design 7 done (11/s) | 626ms
ℹ The lower bound for efficiency is 99.9957589718079%
#> 
                                                   
result_kl_mm
#> Optimal design for KL-Optimality:
#>      Point    Weight
#> 1 1.127578 0.8159831
#> 2 5.000000 0.1840169

summary() shows the GLM family and the optimal rival parameters found by the internal optimisation:

summary(result_kl_mm)
#> Model:
#> y ~ Vmax * x/(Km + x)
#> Weight function:
#> function (x) 
#> 1 
#> 
#> GLM family: Normal  (phi = 1)
#> Optimal rival parameters: 0.445
#> 
#> Optimal design for KL-Optimality:
#>      Point    Weight
#> 1 1.127578 0.8159831
#> 2 5.000000 0.1840169
#> 
#> Minimum efficiency (Atwood): 99.9957589718079%
#> Criterion value:             0.155804

The sensitivity plot shows \(d_{\text{KL}}(x, \xi)\) with the Equivalence Theorem bound (dashed). Support points concentrate in the saturation region where MM and the linear rival differ most:

plot(result_kl_mm)
KL sensitivity function: MM vs linear rival.

KL sensitivity function: MM vs linear rival.

Efficiency of a five-point uniform design relative to the KL-optimal:

design_unif <- data.frame(
  Point  = c(0.1, 1.3, 2.5, 3.8, 5.0),
  Weight = rep(1/5, 5)
)
eff_kl <- design_efficiency(design_unif, result_kl_mm)
#> ℹ The efficiency of the design is 44.1071528869909%
cat("Efficiency of uniform design:", round(eff_kl * 100, 2), "%\n")
#> Efficiency of uniform design: 44.11 %

Example 2: exponential decay, Poisson family

Context. A drug clearance study where the event count follows a Poisson process with rate \(\mu(x) = \exp(a - bx)\). The reference model has nominal decay rate \(b = 0.5\). The rival represents a faster-elimination regime (\(b \in [0.8, 1.5]\)), which could arise from drug interactions. The KL divergence for Poisson is \(\text{KL} = \mu_2 - \mu_1 - \mu_1 \log(\mu_2/\mu_1)\); since the rival decays faster, the gap is largest at long follow-up times.

result_kl_pois <- opt_des(
  "KL-Optimality",
  model        = y ~ exp(a - b * x),
  parameters   = c("a", "b"),
  par_values   = c(2, 0.5),
  design_space = c(0, 4),
  rival_pars   = c(2, 1.0),
  rival_lower  = c(1.5, 0.8),
  rival_upper  = c(2.5, 1.5),
  family       = "Poisson",
  phi          = 1
)
#> 
#> ℹ Stop condition reached: difference between sensitivity and criterion < 1e-05
#> 
⠙ Calculating optimal design 4 done (7.2/s) | 555ms
ℹ The lower bound for efficiency is 99.9999938849073%
#> 
                                                    
result_kl_pois
#> Optimal design for KL-Optimality:
#>   Point    Weight
#> 1     0 0.2012678
#> 2     4 0.7987322
summary(result_kl_pois)
#> Model:
#> y ~ exp(a - b * x)
#> Weight function:
#> function (x) 
#> 1 
#> 
#> GLM family: Poisson  (phi = 1)
#> Optimal rival parameters: 2.2799, 0.8
#> 
#> Optimal design for KL-Optimality:
#>   Point    Weight
#> 1     0 0.2012678
#> 2     4 0.7987322
#> 
#> Minimum efficiency (Atwood): 99.9999938849073%
#> Criterion value:             0.318553
plot(result_kl_pois)
KL sensitivity for Poisson decay model.

KL sensitivity for Poisson decay model.

The optimal rival parameters found internally confirm the closest rival within the constraint:

hv <- attr(result_kl_pois, "hidden_value")
cat("Optimal rival: a =", round(hv$beta2_star[1], 3),
    " b =", round(hv$beta2_star[2], 3), "\n")
#> Optimal rival: a = 2.28  b = 0.8

Using make_kl_fun() for custom KL functions

For some discrimination problems the KL divergence does not reduce to the standard cumulant-generating-function form that opt_des() uses internally. make_kl_fun() automates the construction of the KL function from two model-family pairs.

Supported pairs

Combination Formula
Same family, same \(\phi\) Standard cumulant form
Normal vs Normal, \(\phi_1 \neq \phi_2\) \(\frac{1}{2}\left[\log\frac{\phi_2}{\phi_1} + \frac{\phi_1}{\phi_2} + \frac{(\mu_1-\mu_2)^2}{\phi_2} - 1\right]\)
Gamma vs Gamma, different shape Closed form with digamma/lgamma

For other pairs, supply kl_fun directly as a function (x, beta2) -> scalar.

Example 3: Normal with different variances (1D)

Context. Same exponential decay model, but now the rival has variance \(\phi_2 = 4\) (four times larger). The KL function includes a constant term \(\frac{1}{2}[\log(\phi_2/\phi_1) + \phi_1/\phi_2 - 1]\) even when the means coincide, so the design must also account for the variance difference.

kl_fn_var <- make_kl_fun(
  "Normal",
  model1      = y ~ a * exp(-b * x),
  params1     = c("a", "b"),
  par_values1 = c(1, 0.5),
  phi1        = 1,
  family2     = "Normal",
  model2      = y ~ c * exp(-d * x),
  params2     = c("c", "d"),
  phi2        = 4
)

result_kl_var <- opt_des(
  "KL-Optimality",
  model        = y ~ a * exp(-b * x),
  parameters   = c("a", "b"),
  par_values   = c(1, 0.5),
  design_space = c(0, 4),
  kl_fun       = kl_fn_var,
  rival_pars   = c(1, 1.0),
  rival_lower  = c(0.5, 0.8),
  rival_upper  = c(2.0, 1.5)
)
#> 
⠙ Calculating optimal design 10 done (4.9/s) | 2.1s

⠹ Calculating optimal design 11 done (4.8/s) | 2.3s

⠸ Calculating optimal design 12 done (4.7/s) | 2.6s

⠼ Calculating optimal design 13 done (4.6/s) | 2.8s

⠴ Calculating optimal design 14 done (4.6/s) | 3.1s

⠦ Calculating optimal design 15 done (4.6/s) | 3.3s

⠧ Calculating optimal design 16 done (4.6/s) | 3.5s

⠇ Calculating optimal design 17 done (4.6/s) | 3.7s

⠏ Calculating optimal design 18 done (4.7/s) | 3.9s

⠋ Calculating optimal design 19 done (4.6/s) | 4.2s

⠙ Calculating optimal design 20 done (4.6/s) | 4.4s

⠹ Calculating optimal design 21 done (4.6/s) | 4.5s
#> ℹ Stop condition not reached, max iterations performed
#> 
⠸ Calculating optimal design 22 done (4.7/s) | 4.7s
ℹ The lower bound for efficiency is 99.9671378206639%
#> 
                                                    
result_kl_var
#> Optimal design for KL-Optimality:
#>      Point     Weight
#> 1 0.000000 0.16826906
#> 2 1.333333 0.07912872
#> 3 1.989481 0.61240029
#> 4 2.666667 0.14020193
summary(result_kl_var)
#> Model:
#> y ~ a * exp(-b * x)
#> Weight function:
#> function (x) 
#> 1 
#> 
#> KL function: user-supplied
#> Optimal rival parameters: 1.1353, 0.8
#> 
#> Optimal design for KL-Optimality:
#>      Point     Weight
#> 1 0.000000 0.16826906
#> 2 1.333333 0.07912872
#> 3 1.989481 0.61240029
#> 4 2.666667 0.14020193
#> 
#> Minimum efficiency (Atwood): 99.9671378206639%
#> Criterion value:             0.320445
plot(result_kl_var)
KL sensitivity: Normal phi=1 vs Normal phi=4.

KL sensitivity: Normal phi=1 vs Normal phi=4.

Example 4: two-factor MM vs linear rival (make_kl_fun, 2D)

make_kl_fun() works with models that use different subsets of the design variables. Here the reference uses \((x_1, x_2)\) (bisubstrate MM) while the rival depends only on \(x_1\) (single-substrate linear approximation):

kl_fn_2d <- make_kl_fun(
  "Normal",
  model1      = y ~ Vmax * x1 * x2 / ((Km1 + x1) * (Km2 + x2)),
  params1     = c("Vmax", "Km1", "Km2"),
  par_values1 = c(1, 1, 1),
  model2      = y ~ alpha * x1,
  params2     = "alpha"
)

result_kl_2d <- opt_des(
  "KL-Optimality",
  model        = y ~ Vmax * x1 * x2 / ((Km1 + x1) * (Km2 + x2)),
  parameters   = c("Vmax", "Km1", "Km2"),
  par_values   = c(1, 1, 1),
  design_space = list(x1 = c(0.1, 5), x2 = c(0.1, 5)),
  kl_fun       = kl_fn_2d,
  rival_pars   = c(0.5),
  rival_lower  = c(0.05),
  rival_upper  = c(2.0)
)
#> 
⠙ Calculating optimal design 3 done (1.3/s) | 2.3s
#> ℹ Stop condition reached: difference between sensitivity and criterion < 1e-05
#> 
⠹ Calculating optimal design 6 done (2.4/s) | 2.5s
ℹ The lower bound for efficiency is 99.991707653218%
#> 
                                                   
result_kl_2d
#> Optimal design for KL-Optimality (2 factors):
#>         x1  x2    Weight
#> 1 2.062645 5.0 0.7079501
#> 2 5.000000 0.1 0.2920499

The heatmap concentrates support where the difference between the bisubstrate hyperbola and the linear rival is greatest (saturation regime in both \(x_1\) and \(x_2\)):

plot(result_kl_2d)
KL sensitivity heatmap: 2D MM vs linear rival.

KL sensitivity heatmap: 2D MM vs linear rival.

Example 5: discriminating error structures — the Hill model

Reference: De la Calle-Arroyo, Leorato, Rodríguez-Aragón & Tommasi (2025). Augmented designs to choose between constant absolute and relative errors. Chemometrics and Intelligent Laboratory Systems, doi:10.1016/j.chemolab.2025.105374.

Context. Dose-response experiments with the Hill (4-parameter Emax) model:

\[\eta(x, \beta) = (E_{\text{con}} - b) \cdot \frac{(x/\text{IC}_{50})^s}{1 + (x/\text{IC}_{50})^s} + b\]

with \(E_{\text{con}} = 1.70\), \(b = 0.137\), \(\text{IC}_{50} = 111\), \(s = -1.03\) and \(x \in [0.01, 1500]\) nM. Two error structures are competing:

The KL divergence minimised over the rival variance \(\sigma_{\text{abs}}^2\) is

\[\text{KL}(x,\, \sigma_{\text{abs}}^2) = \frac{1}{2}\!\left(\frac{\eta^2(x)}{\sigma_{\text{abs}}^2} - 1 + \log\frac{\sigma_{\text{abs}}^2}{\eta^2(x)}\right).\]

We supply this directly as kl_fun since it is not covered by the standard Normal-vs-Normal formula (the reference variance is \(x\)-dependent):

kl_fun_hill <- function(x, beta2) {
  sigma_sq <- beta2[1]
  eta <- (1.70 - 0.137) * (x / 111)^(-1.03) / (1 + (x / 111)^(-1.03)) + 0.137
  0.5 * (eta^2 / sigma_sq - 1 + log(sigma_sq / eta^2))
}

result_kl_hill <- opt_des(
  "KL-Optimality",
  model        = y ~ (Econ - b) * (x / IC50)^s / (1 + (x / IC50)^s) + b,
  parameters   = c("Econ", "b", "IC50", "s"),
  par_values   = c(1.70, 0.137, 111, -1.03),
  design_space = c(0.01, 1500),
  kl_fun       = kl_fun_hill,
  rival_pars   = c(1.0),
  rival_lower  = c(1e-4),
  rival_upper  = c(1e6)
)
#> 
#> ℹ Stop condition reached: difference between sensitivity and criterion < 1e-05
#> 
⠙ Calculating optimal design 5 done (40/s) | 127ms
ℹ The lower bound for efficiency is 99.9997190548208%
#> 
                                                   
result_kl_hill
#> Optimal design for KL-Optimality:
#>     Point   Weight
#> 1    0.01 0.233994
#> 2 1500.00 0.766006
summary(result_kl_hill)
#> Model:
#> y ~ (Econ - b) * (x/IC50)^s/(1 + (x/IC50)^s) + b
#> Weight function:
#> function (x) 
#> 1 
#> 
#> KL function: user-supplied
#> Optimal rival parameters: 0.7192
#> 
#> Optimal design for KL-Optimality:
#>     Point   Weight
#> 1    0.01 0.233994
#> 2 1500.00 0.766006
#> 
#> Minimum efficiency (Atwood): 99.9997190548208%
#> Criterion value:             0.813492
plot(result_kl_hill)
KL sensitivity for the Hill model (error structure discrimination).

KL sensitivity for the Hill model (error structure discrimination).

The KL-optimal design places all weight at the two extremes of the design space. This matches the result reported in the paper (support at \(x = 0.01\) with weight 0.23 and \(x = 1500\) with weight 0.77). The Atwood criterion at 100 % confirms the design is exactly optimal.

The optimal rival variance recovered internally:

hv_hill <- attr(result_kl_hill, "hidden_value")
cat("Optimal rival sigma_abs^2 =", round(hv_hill$beta2_star, 4), "\n")
#> Optimal rival sigma_abs^2 = 0.7192

Note that with only two support points this design cannot estimate all four parameters of the Hill model. The paper proposes combining it with a D-optimal design via the augmentation workflow described in vignette("optedr-augment").