Suppose we observe \(n\) independent binary trials — each a success (1) or failure (0) — with a common success probability \(\theta \in (0,1)\). The total number of successes \(y = \sum_i y_i\) follows:
\[ y \mid \theta \;\sim\; \mathrm{Binomial}(n,\, \theta). \]
The likelihood is
\[ p(y \mid \theta) \;=\; \binom{n}{y}\,\theta^y\,(1-\theta)^{n-y}, \]
and everything relevant for updating our beliefs about \(\theta\) is captured by the two sufficient statistics \(y\) (successes) and \(n - y\) (failures).
The parameter \(\theta\) is a probability, so its support is \([0,1]\). The Beta distribution is the natural conjugate prior:
\[ p(\theta) \;=\; \frac{\theta^{a-1}(1-\theta)^{b-1}}{B(a,b)}, \qquad \theta \in [0,1], \]
with shape parameters \(a > 0\) and \(b > 0\). Its mean is \(a/(a+b)\) and its variance is \(ab/[(a+b)^2(a+b+1)]\).
Interpreting the parameters. Think of \(a - 1\) as a “prior number of successes” and \(b - 1\) as a “prior number of failures” accumulated before the current study. The total “prior sample size” is \(a + b - 2\) and the prior mean is \(a/(a+b)\).
| Prior belief | Parameter choice |
|---|---|
| No information — completely flat | \(a = b = 1\) (uniform on \([0,1]\)) |
| Mild belief near 0.5 | \(a = b = 3\) |
| Moderate belief near 0.4 | \(a = 4, b = 6\) (mean = 0.4) |
| Strong belief near 0.2 | \(a = 4, b = 16\) (mean = 0.2) |
Multiply the likelihood kernel \(\theta^y(1-\theta)^{n-y}\) by the prior kernel \(\theta^{a-1}(1-\theta)^{b-1}\):
\[ p(\theta \mid y) \;\propto\; \theta^{(a+y)-1}(1-\theta)^{(b+n-y)-1}. \]
This is the kernel of a Beta distribution, so the conjugate posterior is:
\[ \boxed{\theta \mid y \;\sim\; \mathrm{Beta}(a + y,\; b + n - y).} \]
The update rule is simple:
\[ \mathbb{E}[\theta \mid y] \;=\; \frac{a + y}{a + b + n} \;=\; \frac{a+b}{a+b+n}\cdot\frac{a}{a+b} \;+\; \frac{n}{a+b+n}\cdot\frac{y}{n}. \]
This is a weighted average of the prior mean \(a/(a+b)\) and the data frequency \(y/n\), with weights proportional to the “prior sample size” \(a+b\) and the actual sample size \(n\). When \(n\) is much larger than \(a + b\) the data dominate; when \(n\) is small the prior pulls the estimate toward its own mean.
bayesrules illustrationThe bayesrules package provides
plot_beta_binomial() and
summarize_beta_binomial() for quick visualization and
tabular summaries of the Beta–Binomial update (Johnson et al. 2022).
Scenario. A clinical researcher believes, based on earlier trials, that about 40% of patients respond to a new analgesic. They encode this as a Beta(4, 6) prior (mean = 0.4, effective prior sample size = 10). In a new pilot study they observe \(y = 14\) responders out of \(n = 30\) patients.
a0 <- 4; b0 <- 6 ## Beta(4, 6) prior: mean = 0.4
y <- 14; n <- 30 ## data: 14/30 successes
## Analytic posterior
post_a <- a0 + y ## 4 + 14 = 18
post_b <- b0 + (n - y) ## 6 + 16 = 22library(bayesrules)
## Overlay prior, likelihood (scaled), and posterior densities
plot_beta_binomial(
alpha = a0,
beta = b0,
y = y,
n = n,
prior = TRUE,
likelihood = TRUE,
posterior = TRUE
)## Tabular summary of prior, likelihood, and posterior
summarize_beta_binomial(alpha = a0, beta = b0, y = y, n = n)
#> model alpha beta mean mode var sd
#> 1 prior 4 6 0.40 0.3750000 0.021818182 0.14770979
#> 2 posterior 18 22 0.45 0.4473684 0.006036585 0.07769547Reading the output. The posterior parameters are Beta(18, 22), giving a posterior mean of \(18/40 = 0.45\) — pulled from the prior mean (0.40) toward the data frequency (\(14/30 \approx 0.47\)) in proportion to their relative weights. The posterior is noticeably narrower than the prior, reflecting the information gained from 30 observations.
Analytic posterior.
bb_analytic <- data.frame(
Example = "Analgesic trial (BayesRules)",
n = n,
y = y,
Posterior = sprintf("Beta(%d, %d)", post_a, post_b),
Mean = post_a / (post_a + post_b),
SD = sqrt(post_a * post_b / ((post_a + post_b)^2 * (post_a + post_b + 1))),
check.names = FALSE
)
knitr::kable(bb_analytic, digits = 4,
caption = "Conjugate Beta--Binomial posterior")| Example | n | y | Posterior | Mean | SD |
|---|---|---|---|---|---|
| Analgesic trial (BayesRules) | 30 | 14 | Beta(18, 22) | 0.45 | 0.0777 |
The Bechdel test asks whether a film features at
least two women who talk to each other about something other than a man.
The bayesrules::bechdel dataset records the test result for
1794 films (Johnson et al. 2022).
Research question. What fraction of films pass the Bechdel test?
Prior. Based on earlier analyses suggesting roughly 45% of films pass, we specify a Beta(9, 11) prior (mean = 0.45, effective sample size = 20 — moderately informative).
library(bayesrules)
## Binary outcome: 1 = PASS, 0 = FAIL
pass <- as.integer(bechdel[["binary"]] == "PASS")
n_bech <- length(pass)
y_bech <- sum(pass)
cat(sprintf("n = %d, y (PASS) = %d, proportion = %.3f\n",
n_bech, y_bech, y_bech / n_bech))
#> n = 1794, y (PASS) = 803, proportion = 0.448a0_b <- 9; b0_b <- 11 ## Beta(9, 11) prior: mean = 0.45
post_a_b <- a0_b + y_bech
post_b_b <- b0_b + (n_bech - y_bech)
c(prior_mean = a0_b / (a0_b + b0_b),
data_freq = round(y_bech / n_bech, 4),
post_mean = round(post_a_b / (post_a_b + post_b_b), 4))
#> prior_mean data_freq post_mean
#> 0.4500 0.4476 0.4476
## 95% credible interval
round(qbeta(c(0.025, 0.975), post_a_b, post_b_b), 4)
#> [1] 0.4248 0.4706With over 1700 films the data dominate the prior; the posterior mean is nearly identical to the observed proportion.
analytic_mean_b <- post_a_b / (post_a_b + post_b_b)
analytic_sd_b <- sqrt(post_a_b * post_b_b /
((post_a_b + post_b_b)^2 * (post_a_b + post_b_b + 1)))
bech_analytic <- data.frame(
Dataset = "Bechdel test",
n = n_bech,
y = y_bech,
Posterior = sprintf("Beta(%d, %d)", post_a_b, post_b_b),
Mean = analytic_mean_b,
SD = analytic_sd_b,
check.names = FALSE
)
knitr::kable(bech_analytic, digits = 4,
caption = "Conjugate Beta--Binomial posterior")| Dataset | n | y | Posterior | Mean | SD |
|---|---|---|---|---|---|
| Bechdel test | 1794 | 803 | Beta(812, 1002) | 0.4476 | 0.0117 |
glmb() using
dBeta()The scalar Beta–Binomial model generalizes to
glmb() with
family = binomial(link = "identity") and the
dBeta() prior. In this intercept-only setting the single
regression coefficient is \(\theta\) directly on the probability
scale.
df_bech <- data.frame(y = pass)
bech_beta <- matrix(a0_b / (a0_b + b0_b), nrow = 1, ncol = 1,
dimnames = list(NULL, "(Intercept)"))
bech_pf <- dBeta(shape1 = a0_b, shape2 = b0_b, beta = bech_beta)
set.seed(2026)
fit_bech <- glmb(
n = 20000,
y ~ 1,
data = df_bech,
weights = rep(1L, n_bech),
family = binomial(link = "identity"),
pfamily = bech_pf
)
print(fit_bech)
#>
#> Call: glmb(formula = y ~ 1, family = binomial(link = "identity"), pfamily = bech_pf,
#> n = 20000, data = df_bech, weights = rep(1L, n_bech))
#>
#> Posterior Mean Coefficients:
#> (Intercept)
#> 0.4477
#>
#> Effective Number of Parameters: 0.9969965
#> Expected Residual Deviance: 2468.272
#> DIC: 2469.269bech_compare <- data.frame(
Dataset = "Bechdel test",
Posterior = bech_analytic$Posterior,
`Analytic Mean` = bech_analytic$Mean,
`Analytic SD` = bech_analytic$SD,
`glmb Post.Mean` = fit_bech$coef.means["(Intercept)"],
`glmb Post.Sd` = sd(fit_bech$coefficients[, "(Intercept)", drop = TRUE]),
check.names = FALSE
)
knitr::kable(bech_compare, digits = 4,
caption = "Analytic vs. glmb() posterior mean and SD")| Dataset | Posterior | Analytic Mean | Analytic SD | glmb Post.Mean | glmb Post.Sd | |
|---|---|---|---|---|---|---|
| (Intercept) | Bechdel test | Beta(812, 1002) | 0.4476 | 0.0117 | 0.4477 | 0.0117 |
The glmb Post.Mean and
glmb Post.Sd columns should match the
analytic Mean and SD to Monte Carlo
error because dBeta() implements the exact conjugate
update.
This scalar prototype is the intuition behind Binomial
glmb() (Chapters 7–9), where instead of a single
\(\theta\) you model \(\mathrm{logit}(\theta_i) = x_i^\top \beta\)
and place a multivariate Normal prior on the coefficient vector \(\beta\). The conjugate dBeta()
path is specific to intercept-only models with the identity link; once
covariates enter, the conjugacy is lost and the general envelope sampler
takes over.
glmb().