Chapter 02-S05: Gamma–Gamma Conjugacy for One Response Rate

Kjell Nygren

2026-06-21

library(glmbayes)

1. The model

1.1 Likelihood

Suppose we observe \(n\) positive continuous responses \(y_1, \dots, y_n\) — waiting times, survival times, monetary amounts — drawn from a Gamma distribution with known shape \(k > 0\) and unknown rate \(\beta > 0\):

\[ Y_i \mid \beta \;\sim\; \mathrm{Gamma}(k,\, \beta), \qquad i = 1,\dots,n. \]

The mean response is \(\mu = k/\beta\) and the variance is \(k/\beta^2\). The log-likelihood (up to constants) is:

\[ \ell(\beta) \;\propto\; n k \log\beta - \beta\,\textstyle\sum_i y_i. \]

Why fix the shape? Conjugacy requires \(k\) to be treated as known. In practice you either know it from domain knowledge (e.g., exponential responses have \(k = 1\)) or estimate it by method-of-moments: \(\hat k = \bar y^2 / s^2\). When both \(k\) and \(\beta\) are unknown, conjugacy breaks and a rejection sampler is needed — that case is handled by Gamma(log) with glmbdisp().

The special case \(k = 1\) gives the Exponential distribution, the natural model for inter-event waiting times under a constant-hazard assumption.

1.2 Prior

Because \(\beta > 0\), the Gamma distribution is again the natural conjugate prior:

\[ \beta \;\sim\; \mathrm{Gamma}(\alpha_0,\, b_0), \]

with prior mean \(\alpha_0 / b_0\) and variance \(\alpha_0 / b_0^2\).

Relationship to the dispersion. In GLM terms the Gamma dispersion is \(\phi = 1/k\), so the shape \(k\) is the precision \(1/\phi\). Specifying lik_shape = k in dGamma(Inv_Dispersion = FALSE) is equivalent to specifying a known dispersion.

The “identity” link. When you pass family = Gamma(link = "identity") to glmb(), the single regression coefficient is the rate \(\beta\) directly. This matches the conjugate parameterization.

1.3 Posterior derivation

Multiply the likelihood kernel \(\beta^{n k}\,e^{-\beta \sum y_i}\) by the prior kernel \(\beta^{\alpha_0 - 1}\,e^{-b_0 \beta}\):

\[ p(\beta \mid y) \;\propto\; \beta^{(\alpha_0 + n k) - 1}\,e^{-(b_0 + \sum_i y_i)\,\beta}. \]

This is a Gamma kernel, giving the conjugate posterior:

\[ \boxed{\beta \mid y \;\sim\; \mathrm{Gamma}\!\left(\alpha_0 + n k,\;\; b_0 + \textstyle\sum_i y_i\right).} \]

Update rule:

The posterior mean rate is \((\alpha_0 + n k)/(b_0 + \sum y_i)\), and the posterior mean response is:

\[ \mathbb{E}[\mu \mid y] \;\approx\; \frac{k}{\mathbb{E}[\beta \mid y]} \;=\; \frac{k(b_0 + \sum_i y_i)}{\alpha_0 + n k}. \]


2. Illustration with base-R plots

The bayesrules package does not currently include a plot_gamma_gamma() helper, so we plot the prior, the (normalized) likelihood, and the posterior directly with curve().

Scenario. You record the waiting time between calls to a customer service hotline. Exponential waiting times (\(k = 1\)) are a standard model. You place a \(\Gamma(2, 4)\) prior on the call rate \(\beta\) (prior mean = 0.5 calls/minute). You then observe \(n = 20\) inter-arrival times with sum \(\sum y = 35\) minutes.

alpha0 <- 2;  b0 <- 4         ## Gamma(2, 4) prior: mean = 0.5
k      <- 1                   ## known shape (exponential)
n_obs  <- 20L;  sum_y <- 35   ## data: 20 observations, sum = 35

post_shape <- alpha0 + n_obs * k
post_rate  <- b0    + sum_y

ggamma_analytic <- data.frame(
  Example = "Call inter-arrivals (illustration)",
  n = n_obs,
  `sum(y)` = sum_y,
  Posterior = sprintf("Gamma(%d, %d)", post_shape, post_rate),
  `Mean (rate)` = post_shape / post_rate,
  `SD (rate)`   = sqrt(post_shape) / post_rate,
  check.names = FALSE
)
knitr::kable(ggamma_analytic, digits = 4,
  caption = "Conjugate Gamma--Gamma posterior for rate beta")
Conjugate Gamma–Gamma posterior for rate beta
Example n sum(y) Posterior Mean (rate) SD (rate)
Call inter-arrivals (illustration) 20 35 Gamma(22, 39) 0.5641 0.1203
beta_grid <- seq(0.01, 1.5, length.out = 500)

## Normalized likelihood: Gamma(n*k + 1, sum_y) shape, treated as density
lik_unnorm <- dgamma(beta_grid, shape = n_obs * k + 1, rate = sum_y)

## Scale likelihood to have same peak height as the prior (visual only)
prior_vals   <- dgamma(beta_grid, alpha0, b0)
post_vals    <- dgamma(beta_grid, post_shape, post_rate)
lik_scaled   <- lik_unnorm * (max(prior_vals) / max(lik_unnorm))

plot(beta_grid, prior_vals, type = "l", lwd = 2, col = "steelblue",
     xlab = expression(beta), ylab = "Density",
     main = "Gamma–Gamma update: prior, likelihood, posterior",
     ylim = c(0, max(post_vals) * 1.1))
lines(beta_grid, lik_scaled, lwd = 2, col = "darkgreen", lty = 2)
lines(beta_grid, post_vals,  lwd = 2, col = "tomato")
legend("topright",
       legend = c("Prior  Gamma(2, 4)",
                  "Likelihood (scaled)",
                  sprintf("Posterior  Gamma(%d, %d)", post_shape, post_rate)),
       col  = c("steelblue", "darkgreen", "tomato"),
       lty  = c(1, 2, 1), lwd = 2, bty = "n")
abline(v = alpha0 / b0,          lty = 3, col = "steelblue")
abline(v = post_shape / post_rate, lty = 3, col = "tomato")

The posterior is pulled from the prior mean (0.5) toward the data-based MLE (\(n/\sum y = 20/35 \approx 0.57\)) and is considerably tighter than either.


3. Real data: Boston Marathon finishing times

The bayesrules::cherry_blossom_sample dataset records net finishing times (minutes) for runners in a 10-mile cherry blossom race (Johnson et al. 2022). Finishing times are continuous and positive; we model them as Gamma-distributed with a method-of-moments estimate of the shape \(k\).

library(bayesrules)

## Use net finishing times (minutes); drop missing values
y_cb <- cherry_blossom_sample$net
y_cb <- y_cb[is.finite(y_cb)]
n_cb <- length(y_cb)

ybar_cb <- mean(y_cb)
s2_cb   <- var(y_cb)

## Method-of-moments estimate of shape k: k_hat = ybar^2 / s^2
k_hat_cb <- ybar_cb^2 / s2_cb

cat(sprintf("n = %d,  mean = %.2f min,  var = %.2f,  k_hat = %.3f\n",
            n_cb, ybar_cb, s2_cb, k_hat_cb))
#> n = 185,  mean = 89.95 min,  var = 195.44,  k_hat = 41.395

The estimated shape \(\hat k \approx\) 41.4 is treated as fixed for the conjugate analysis.

Prior. Based on general knowledge that competitive 10-mile race times cluster around 85–100 minutes, we set a \(\Gamma(5, 0.06)\) prior on the rate \(\beta\) (prior mean rate \(\approx 0.083\), corresponding to a prior mean time \(k/\beta \approx 85\) minutes for the estimated \(k\)).

## Set prior centered at rate corresponding to ~85 min mean finish time
## E[mu] = k / beta_prior  =>  beta_prior = k_hat / 85
alpha0_cb <- 5
b0_cb     <- alpha0_cb / (k_hat_cb / ybar_cb)  ## prior mean rate = k_hat / ybar

cat(sprintf("Prior: Gamma(%.2f, %.4f),  prior mean rate = %.4f,  implied mean time = %.1f min\n",
            alpha0_cb, b0_cb,
            alpha0_cb / b0_cb,
            k_hat_cb / (alpha0_cb / b0_cb)))
#> Prior: Gamma(5.00, 10.8644),  prior mean rate = 0.4602,  implied mean time = 89.9 min
post_shape_cb <- alpha0_cb + n_cb * k_hat_cb
post_rate_cb  <- b0_cb     + sum(y_cb)

post_mean_rate_cb <- post_shape_cb / post_rate_cb
post_sd_rate_cb   <- sqrt(post_shape_cb) / post_rate_cb
post_mean_time_cb <- k_hat_cb / post_mean_rate_cb

cb_analytic <- data.frame(
  Dataset = "Cherry blossom finish times",
  n = n_cb,
  `k (fixed)` = k_hat_cb,
  Posterior = sprintf("Gamma(%.2f, %.4f)", post_shape_cb, post_rate_cb),
  `Mean (rate beta)` = post_mean_rate_cb,
  `SD (rate beta)`   = post_sd_rate_cb,
  `Mean time (min)`  = post_mean_time_cb,
  check.names = FALSE
)
knitr::kable(cb_analytic, digits = 4,
  caption = "Conjugate Gamma--Gamma posterior for rate beta (shape--rate)")
Conjugate Gamma–Gamma posterior for rate beta (shape–rate)
Dataset n k (fixed) Posterior Mean (rate beta) SD (rate beta) Mean time (min)
Cherry blossom finish times 185 41.3954 Gamma(7663.15, 16651.0310) 0.4602 0.0053 89.9468

3.1 Fitting with glmb()

df_cb <- data.frame(y = y_cb)

cb_beta <- matrix(alpha0_cb / b0_cb, nrow = 1, ncol = 1,
                  dimnames = list(NULL, "(Intercept)"))

cb_pf <- dGamma(
  shape          = alpha0_cb,
  rate           = b0_cb,
  beta           = cb_beta,
  Inv_Dispersion = FALSE,
  lik_shape      = k_hat_cb
)

set.seed(2026)
fit_cb <- glmb(
  n       = 20000,
  y ~ 1,
  data    = df_cb,
  family  = Gamma(link = "identity"),
  pfamily = cb_pf
)
print(fit_cb)
#> 
#> Call:  glmb(formula = y ~ 1, family = Gamma(link = "identity"), pfamily = cb_pf, 
#>     n = 20000, data = df_cb)
#> 
#> Posterior Mean Coefficients:
#> (Intercept)  
#>      0.4603  
#> 
#> Effective Number of Parameters: 40.76645 
#> Expected Residual Deviance: 7790.162 
#> DIC: 62148.07
cb_compare <- data.frame(
  Dataset = "Cherry blossom finish times",
  Posterior = cb_analytic$Posterior,
  `Analytic Mean (rate)` = cb_analytic$`Mean (rate beta)`,
  `Analytic SD (rate)`   = cb_analytic$`SD (rate beta)`,
  `glmb Post.Mean` = fit_cb$coef.means["(Intercept)"],
  `glmb Post.Sd`   = sd(fit_cb$coefficients[, "(Intercept)", drop = TRUE]),
  check.names = FALSE
)
knitr::kable(cb_compare, digits = 4,
  caption = "Analytic vs. glmb() posterior mean and SD for rate beta")
Analytic vs. glmb() posterior mean and SD for rate beta
Dataset Posterior Analytic Mean (rate) Analytic SD (rate) glmb Post.Mean glmb Post.Sd
(Intercept) Cherry blossom finish times Gamma(7663.15, 16651.0310) 0.4602 0.0053 0.4603 0.0052

The glmb Post.Mean and glmb Post.Sd columns refer to the rate \(\beta\). Mean finish time is \(\hat k\) divided by posterior mean rate (see Mean time in the analytic table).


4. Connection to glmbayes

The scalar Gamma–Gamma model is the conjugate prototype for Gamma glmb() with family = Gamma(link = "identity"). In that setting the single coefficient is the rate \(\beta\), the prior is dGamma(shape, rate, beta, Inv_Dispersion = FALSE, lik_shape = k), and the posterior update is the closed form above.

When the shape \(k\) is unknown, the Gamma prior on \(k\) is no longer conjugate. That case is handled by Gamma(link = "log") with dNormal() priors on the regression coefficients plus glmbdisp() for the dispersion \(\phi = 1/k\) (Chapter 10).


See also

References

Johnson, Alicia A., Miles Q. Ott, and Mine Dogucu. 2022. Bayes Rules! An Introduction to Applied Bayesian Modeling. CRC Press. https://www.bayesrulesbook.com.