---
title: "Longitudinal MAIHDA: intersectional inequalities over time"
author: "Hamid Bulut"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Longitudinal MAIHDA: intersectional inequalities over time}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.width = 7,
  fig.height = 4.5
)
library(MAIHDA)
```

## From a snapshot to a trajectory

Standard (cross-sectional) MAIHDA answers "do intersectional strata differ?" at a
single point in time, summarised by the between-stratum VPC. With repeated
measurements we can ask a richer question: *"do strata differ in how they **change**
over time?"* -- and decompose those trajectory differences into an additive part
(the dimensions' main effects and their interactions with time) and a
multiplicative part (an intersectional trajectory beyond additive).

This is the longitudinal MAIHDA of Bell, Evans, Holman & Leckie (2024). It is a
3-level growth-curve model with measurement occasions (level 1) within individuals
(level 2) within intersectional strata (level 3), with a random intercept and a
slope on time at both the individual and stratum levels:

$$
y_{tij} = \beta_0 + \beta_1 t + \underbrace{(u_{0j} + u_{1j}\,t)}_{\text{stratum}}
        + \underbrace{(v_{0ij} + v_{1ij}\,t)}_{\text{individual}} + e_{tij}.
$$

Because the stratum random effect now has a slope, the between-stratum variance becomes a **function of time**:

$$
\operatorname{Var}_S(t) = \sigma^2_{u0} + 2t\,\sigma_{u01} + t^2 \sigma^2_{u1},
\qquad
\operatorname{VPC}_S(t) = \frac{\operatorname{Var}_S(t)}
                               {\operatorname{Var}_S(t) + \operatorname{Var}_I(t) + \sigma^2_e}.
$$

## The data

The bundled `maihda_long_data` is a simulated panel: 600 people, each measured over
five waves, within 12 strata of gender $\times$ ethnicity $\times$ education. The
trajectory differences are constructed to be mostly additive with one genuine
interaction, so the decomposition below is interpretable.

```{r data}
data(maihda_long_data)
head(maihda_long_data)
```

It is long format, one row per person-occasion, with a person id (`id`) and a
numeric time (`wave`).

## Fitting and the time-varying VPC

Supply `id` and `time` to `fit_maihda()`. You only need to write the strata shorthand
`(1 | var1:var2)`; the growth slopes on time are added for you.

```{r fit}
m <- fit_maihda(wellbeing ~ wave + (1 | gender:ethnicity:education),
                data = maihda_long_data, id = "id", time = "wave")
summary(m)
```

`summary()` reports the VPC at the baseline (reference time, the earliest wave)
and the full trajectory of the VPC across the observed times. Plot it:

```{r vpc-traj}
plot(m, type = "vpc_trajectory")   # VPC(t), with the reference time marked
plot(m, type = "trajectories")     # predicted per-stratum mean trajectories
```

A rising VPC trajectory means the strata fan out over time (intersectional
inequality grows). 

## Decomposing the trajectory: additive vs. multiplicative

`maihda(decomposition = "longitudinal")` (selected automatically when `id`/`time`
are supplied) fits a null growth model and an adjusted growth model. The
adjusted model adds the dimensions' main effects and their interactions with time
(`dim:time`), so the stratum-level variance it leaves behind is the interaction
beyond additive.

```{r decomp}
a <- maihda(wellbeing ~ wave + (1 | gender:ethnicity:education),
            data = maihda_long_data, id = "id", time = "wave",
            decomposition = "longitudinal")
a$pcv
```

The PCV is reported separately for the two pieces of the trajectory:

* **`PCV_intercept`** -- the share of the *baseline* between-stratum inequality
  explained by the additive main effects.
* **`PCV_slope`** -- the share of the *trajectory* (slope) between-stratum inequality
  explained additively. A high `PCV_slope` is the "trajectories are mostly additive"
  finding; the remainder is the multiplicative/interaction part.

```{r pcv-traj}
plot(a, type = "pcv_trajectory")   # the additive share over time
```

## Scope and cautions

* **Identifiability.** A stratum random slope needs enough occasions per stratum;
  sparse strata or few waves can give a singular lme4 fit (surfaced in the fit
  diagnostics) -- the brms engine handles this better.
* **Not a single number.** The VPC is time-varying, so `extract_between_variance()`
  and `calculate_pvc()` deliberately refuse a longitudinal model; use the
  longitudinal decomposition above.
* **Out of scope (v1).** Design-weighted, contextual (stratum $\times$ place
  $\times$ time), and `wemix`/`ordinal` longitudinal models are not yet supported.

## Reference

Bell, A., Evans, C., Holman, D., & Leckie, G. (2024). Extending intersectional
multilevel analysis of individual heterogeneity and discriminatory accuracy
(MAIHDA) to study individual longitudinal trajectories, with application to mental
health in the UK. *Social Science & Medicine*, 351, 116955.
<doi:10.1016/j.socscimed.2024.116955>
