Get started

Introduction

SEMs are ubiquitous in the social sciences, psychology, ecology, and other fields. The {INLAvaan} package (Jamil and Rue 2026b) provides a user-friendly interface for fitting Bayesian SEMs using Integrated Nested Laplace Approximations (INLA, Rue et al. 2009), based on a bespoke approximate Bayesian inference framework for SEMs (Jamil and Rue 2026a). This vignette will guide you through the basics of using {INLAvaan} to fit a simple SEM. Before we begin make sure you have installed the INLAvaan package from GitHub by running the commands below.

# Load all libraries for this example
library(INLAvaan)
library(tidyverse)
library(lavaan)

Motivating example

To motivate the use of SEMs, consider the introductory example in Song and Lee (2012): Does poorer glycemic control lead to greater severity of kidney disease? We observe three indicators of glycemic control (y1, y2, y3) and three indicators of kidney disease severity (y4, y5, y6).

Indicator Description Unit
y1 HbA1c 3-month avg. blood glucose %
y2 FPG Fasting plasma glucose mmol/L
y3 Insulin Fasting insulin level μU/mL
y4 PCr Plasma creatinine μmol/L
y5 ACR Albumin–creatinine ratio mg/g
y6 BUN Blood urea nitrogen mmol/L

Rather than fitting separate regression models for each indicator, SEM allows us to model the relationship between the latent constructs themselves, providing a clearer and more coherent representation of the underlying processes. The hypothesised SEM is illustrated by the figure below:

Data

For the two-factor SEM we described above, it is easy to simulate some data using the {lavaan} package to do so. The code is below:

pop_mod <- "
  eta1 =~ 1*y1 + 0.8*y2 + 0.6*y3
  eta2 =~ 1*y4 + 0.8*y5 + 0.6*y6
  eta2 ~ 0.3*eta1
  
  # Variances
  y1 ~~ 0.5*y1
  y2 ~~ 0.5*y2
  y3 ~~ 0.5*y3
  y4 ~~ 0.5*y4
  y5 ~~ 0.5*y5
  y6 ~~ 0.5*y6
  eta1 ~~ 1*eta1
  eta2 ~~ 1*eta2
"
set.seed(123)
dat <- lavaan::simulateData(pop_mod, sample.nobs = 1000)
str(dat)
#> 'data.frame':    1000 obs. of  6 variables:
#>  $ y1: num  1.355 1.249 -1.03 -0.758 1.082 ...
#>  $ y2: num  0.641 0.782 -1.472 1.07 2.028 ...
#>  $ y3: num  0.9319 0.1245 -0.5622 -0.0245 1.5501 ...
#>  $ y4: num  -0.3453 0.0107 -2.1935 0.5164 -1.0021 ...
#>  $ y5: num  0.132 -0.464 -0.108 -0.334 -3.181 ...
#>  $ y6: num  0.0329 -0.917 -1.216 -0.9273 -0.0237 ...

From the code above, note the true values of the parameters, including the factor loadings Λ, regression coefficient β between the two latent variables, as well as the residual and latent variances Θ and Ψ respectively.

Model fit

Now that we have simulated some data, we can fit the SEM using INLAvaan. The model syntax is similar to that of {lavaan}, making it easy to specify the model. For further details on the model syntax, refer to the lavaan website. {INLAvaan} provides mirror functions for the main model fitting functions in {lavaan}:

The code to fit the SEM model is below:

mod <- "
  eta1 =~ y1 + y2 + y3
  eta2 =~ y4 + y5 + y6
  eta2 ~ eta1
"
fit <- asem(mod, dat)
#> ℹ Finding posterior mode.
#> ✔ Finding posterior mode. [31ms]
#> 
#> ℹ Computing the Hessian.
#> ✔ Computing the Hessian. [16ms]
#> 
#> ℹ Performing VB correction.
#> ✔ VB correction; mean |δ| = 0.065σ. [52ms]
#> 
#> ⠙ Fitting 0/13 skew-normal marginals.
#> ✔ Fitting 13/13 skew-normal marginals. [82ms]
#> 
#> ℹ Adjusting copula correlations (NORTA).
#> ✔ Adjusting copula correlations (NORTA). [23ms]
#> 
#> ⠙ Posterior sampling and summarising.
#> ✔ Posterior sampling and summarising. [166ms]
#> 

{INLAvaan} computes an approximation to the posterior density by way of a Laplace approximation (Tierney et al. 1989; Jamil and Rue 2026a). The joint mode and the Hessian needs to be computed, which gives a Gaussian distribution for the joint posterior of the parameters. The default method for optimisation is stats::nlminb(), but other optimisers can be used by specifying optim_method = "ucminf" for the {ucminf} package or optim_method = "optim" to call the stats::optim() function with method "BFGS".

From this, marginal posterior distributions for each parameter can be obtained by one of several ways, including 1) Skew normal fitting (marginal_method = "skewnorm", the default method, see Chiuchiolo et al. 2023); 2) Two-piece asymmetric Gaussian fitting (marginal_method = "asymgaus", see Martins et al. 2013); 3) Direct marginalisation of the joint Gaussian posterior (marginal_method = "marggaus"); and 4) Sampling from the joint Gaussian posterior (marginal_method = "sampling").

Once the marginal posterior distributions have been obtained, we can further use these to compute any derived quantities of interest via copula sampling. The posterior predictive p-values (Gelman et al. 1996) and Deviance Information Criterion (DIC, Spiegelhalter et al. 2002) are computed this way. Often, the posterior sampling takes longer than the model fitting itself, so the number of samples can be controlled via the nsamp argument (default is nsamp = 1000) or can be skipped altoghether (test = "none").

Methods

The resulting object is of class INLAvaan, a subclass of lavaan objects.

str(fit, 1)
#> Formal class 'INLAvaan' [package "INLAvaan"] with 21 slots
fit
#> INLAvaan 0.2.4 ended normally after 47 iterations
#> 
#>   Estimator                                      BAYES
#>   Optimization method                           NLMINB
#>   Number of model parameters                        13
#> 
#>   Number of observations                          1000
#> 
#> Model Test (User Model):
#> 
#>    Marginal log-likelihood                   -8069.491 
#>    PPP (Chi-square)                              0.257

As a result, most of the methods that work for lavaan objects will also work for INLAvaan objects. The most common ones are probably coef() and summary().

# Inspect coefficients
coef(fit)
#>   eta1=~y2   eta1=~y3   eta2=~y5   eta2=~y6  eta2~eta1     y1~~y1     y2~~y2 
#>      0.872      0.666      0.799      0.613      0.276      0.512      0.474 
#>     y3~~y3     y4~~y4     y5~~y5     y6~~y6 eta1~~eta1 eta2~~eta2 
#>      0.489      0.475      0.531      0.473      1.012      0.899

# Summary of results
summary(fit)
#> INLAvaan 0.2.4 ended normally after 47 iterations
#> 
#>   Estimator                                      BAYES
#>   Optimization method                           NLMINB
#>   Number of model parameters                        13
#> 
#>   Number of observations                          1000
#> 
#> Model Test (User Model):
#> 
#>    Marginal log-likelihood                   -8069.491 
#>    PPP (Chi-square)                              0.257 
#> 
#> Information Criteria:
#> 
#>    Deviance (DIC)                            16033.429 
#>    Effective parameters (pD)                    13.090 
#> 
#> Parameter Estimates:
#> 
#>    Marginalisation method                     SKEWNORM
#>    VB correction                                  TRUE
#> 
#> Latent Variables:
#>                    Estimate       SD     2.5%    97.5%     NMAD    Prior       
#>   eta1 =~                                                                      
#>     y1                1.000                                                    
#>     y2                0.872    0.042    0.792    0.956    0.004    normal(0,10)
#>     y3                0.666    0.034    0.601    0.733    0.003    normal(0,10)
#>   eta2 =~                                                                      
#>     y4                1.000                                                    
#>     y5                0.799    0.043    0.716    0.886    0.006    normal(0,10)
#>     y6                0.613    0.035    0.546    0.683    0.004    normal(0,10)
#> 
#> Regressions:
#>                    Estimate       SD     2.5%    97.5%     NMAD    Prior       
#>   eta2 ~                                                                       
#>     eta1              0.276    0.039    0.200    0.353    0.001    normal(0,10)
#> 
#> Variances:
#>                    Estimate       SD     2.5%    97.5%     NMAD    Prior       
#>    .y1                0.512    0.045    0.424    0.602    0.003 gamma(1,.5)[sd]
#>    .y2                0.474    0.036    0.404    0.547    0.002 gamma(1,.5)[sd]
#>    .y3                0.489    0.028    0.436    0.546    0.000 gamma(1,.5)[sd]
#>    .y4                0.475    0.049    0.379    0.572    0.006 gamma(1,.5)[sd]
#>    .y5                0.531    0.037    0.460    0.605    0.001 gamma(1,.5)[sd]
#>    .y6                0.473    0.027    0.422    0.528    0.000 gamma(1,.5)[sd]
#>     eta1              1.012    0.075    0.870    1.164    0.003 gamma(1,.5)[sd]
#>    .eta2              0.899    0.070    0.767    1.041    0.004 gamma(1,.5)[sd]

It’s possible to request posterior medians and modes in the summary output by specifying postmedian = TRUE or postmode = TRUE in the summary() function.

Predictions

Predicted values for the latent variables can be obtained using the predict() function. This is done by sampling from the posterior distributions of the latent variables given the observed data. The function also supports predictions for observed variables (e.g. type = "ov") and missing data imputation, respecting multilevel structure if present.

eta_preds <- predict(fit, nsamp = 100)
length(eta_preds)
#> [1] 100
head(eta_preds[[1]])
#>            eta1       eta2
#> [1,]  1.1670840 -0.6320355
#> [2,]  0.7293198  0.1596273
#> [3,] -0.9065782 -1.3504897
#> [4,]  0.5051917 -0.2648771
#> [5,]  2.4458991 -1.8979947
#> [6,] -1.6668187 -0.9006794

This is an S3 object with a summary method that provides posterior means and credible intervals for the latent variables. Alternatively, the user is welcome to perform their own summary statistics on the list of posterior samples returned by predict().

summ_eta <- summary(eta_preds)
str(summ_eta)
#> List of 7
#>  $ group_id: NULL
#>  $ Mean    : num [1:1000, 1:2] 0.8844 0.6881 -1.1299 0.0378 1.3533 ...
#>   ..- attr(*, "dimnames")=List of 2
#>   .. ..$ : NULL
#>   .. ..$ : chr [1:2] "eta1" "eta2"
#>  $ SD      : num [1:1000, 1:2] 0.455 0.412 0.442 0.407 0.406 ...
#>   ..- attr(*, "dimnames")=List of 2
#>   .. ..$ : NULL
#>   .. ..$ : chr [1:2] "eta1" "eta2"
#>  $ 2.5%    : num [1:1000, 1:2] 0.1392 -0.0713 -2.1025 -0.7508 0.5734 ...
#>   ..- attr(*, "dimnames")=List of 2
#>   .. ..$ : NULL
#>   .. ..$ : chr [1:2] "eta1" "eta2"
#>  $ 50%     : num [1:1000, 1:2] 0.813 0.673 -1.103 0.011 1.372 ...
#>   ..- attr(*, "dimnames")=List of 2
#>   .. ..$ : NULL
#>   .. ..$ : chr [1:2] "eta1" "eta2"
#>  $ 97.5%   : num [1:1000, 1:2] 1.867 1.57 -0.456 0.844 2.239 ...
#>   ..- attr(*, "dimnames")=List of 2
#>   .. ..$ : NULL
#>   .. ..$ : chr [1:2] "eta1" "eta2"
#>  $ Mode    : num [1:1000, 1:2] 0.7639 0.671 -1.0081 -0.0488 1.3946 ...
#>   ..- attr(*, "dimnames")=List of 2
#>   .. ..$ : NULL
#>   .. ..$ : chr [1:2] "eta1" "eta2"
#>  - attr(*, "class")= chr "summary.predict.inlavaan_internal"
head(summ_eta$Mean)
#>             eta1        eta2
#> [1,]  0.88441439 -0.01240756
#> [2,]  0.68805670 -0.28895412
#> [3,] -1.12989370 -1.29848448
#> [4,]  0.03776403 -0.05228932
#> [5,]  1.35331890 -1.37743617
#> [6,] -1.80603214 -0.92398580

Diagnostics

The diagnostics() function reports convergence and approximation-quality metrics for the fitted model. Global diagnostics (type = "global") check whether the optimiser converged, and quantify how well the skew-normal marginals match the joint posterior (via KL divergence and NMAD). Per-parameter diagnostics (type = "param") provide gradient norms and KL contributions for each free parameter, which is useful for identifying any problematic parameters.

diagnostics(fit)
#>          npar         nsamp     converged    iterations      grad_inf 
#>            13          1000             1            47      4.69e-03 
#>  grad_inf_rel       grad_l2     hess_cond    vb_applied vb_kld_global 
#>      1.15e-01      8.84e-03      2.84e+01             1        6.3235 
#>       kld_max      kld_mean      nmad_max     nmad_mean 
#>        0.0115        0.0031        0.0062        0.0029

The timing() function reports how long each computation stage took, which can help identify bottlenecks when scaling to larger models.

timing(fit)
#>  total 
#> 0.40 s

Plot

A simple plot method is provided to view the marginal posterior distributions of the parameters. The vertical lines indicate the posterior mode.

plot(fit)

Model comparison

In addition to several global fit indices (i.e. PPP, DIC), it is possible to compare models by way of Bayes factors using the compare() function. This function takes two INLAvaan objects and computes the Bayes factor using the Laplace approximations to the marginal likelihoods.

mod2 <- "
  # A model with uncorrelated factors
  eta1 =~ y1 + y2 + y3
  eta2 =~ y4 + y5 + y6
  eta1 ~~ 0*eta2
"
fit2 <- asem(mod2, dat)
#> ℹ Finding posterior mode.
#> ✔ Finding posterior mode. [17ms]
#> 
#> ℹ Computing the Hessian.
#> ✔ Computing the Hessian. [10ms]
#> 
#> ℹ Performing VB correction.
#> ✔ VB correction; mean |δ| = 0.039σ. [22ms]
#> 
#> ⠙ Fitting 0/12 skew-normal marginals.
#> ✔ Fitting 12/12 skew-normal marginals. [62ms]
#> 
#> ℹ Adjusting copula correlations (NORTA).
#> ✔ Adjusting copula correlations (NORTA). [16ms]
#> 
#> ⠙ Posterior sampling and summarising.
#> ✔ Posterior sampling and summarising. [144ms]
#> 
compare(fit, fit2)
#> Bayesian Model Comparison (INLAvaan)
#> Models ordered by marginal log-likelihood
#> 
#>  Model npar Marg.Loglik   logBF      DIC     pD
#>    fit   13   -8069.491   0.000 16033.43 13.090
#>   fit2   12   -8089.698 -20.208 16083.56 11.769

As a note, there have been several criticisms of the use of Bayes factors for model comparison, particularly in the context of SEMs (Tendeiro and Kiers 2019; Schad et al. 2023). The {blavaan} package is able to implement WAICs and LOOs as alternative model comparison metrics, and these will hopefully also be implemented in future versions of {INLAvaan}.

Setting priors

The {INLAvaan} package uses the same prior specification syntax as {blavaan} (Merkle and Rosseel 2018; Merkle et al. 2021), as detailed here. Essentially, there are two ways to set priors for model parameters: 1) Globally for all parameters of a certain type (e.g., all factor loadings, all regression coefficients, etc.); and 2) Individually for specific parameters in the model syntax.

The default global priors are derived from {blavaan}:

priors_for()  # similar to blavaan::dpriors()
#>                nu             alpha            lambda              beta 
#>    "normal(0,32)"    "normal(0,10)"    "normal(0,10)"    "normal(0,10)" 
#>             theta               psi               rho               tau 
#> "gamma(1,.5)[sd]" "gamma(1,.5)[sd]"       "beta(1,1)"   "normal(0,1.5)"

Note that, {INLAvaan} uses the separation strategy for variance matrices, and consequently places priors on correlations instead of covariances. If, instead we wished to set global priors, say a gamma distribution on variances instead of standard deviations (default), then we would do the following:

DP <- priors_for(theta = "gamma(1,1)", psi = "gamma(1,1)")
DP
#>              nu           alpha          lambda            beta           theta 
#>  "normal(0,32)"  "normal(0,10)"  "normal(0,10)"  "normal(0,10)"    "gamma(1,1)" 
#>             psi             rho             tau 
#>    "gamma(1,1)"     "beta(1,1)" "normal(0,1.5)"
## fit <- asem(mod, dat, dpriors = DP)  # not run

To set individual priors for specific parameters, we can do so in the model syntax itself. For instance, to set a normal prior with mean 1 and standard deviation 3 for the factor loading of y3 on eta1, and a normal prior with mean 0 and standard deviation 0.5 for the regression coefficient from eta1 to eta2, we would specify the model as follows:

mod <- "
  eta1 =~ y1 + y2 + prior('normal(1,3)')*y3
  eta2 =~ y4 + y5 + y6
  eta2 ~ prior('normal(0,.5)')*eta1
"
## fit <- asem(mod, dat)  # not run

Dependency on R-INLA

Dependency on R-INLA has been temporarily removed for the current version of {INLAvaan} (>= 0.2.0). For a wide class LVMs and SEMs where the latent variables are unstructured and independent, the current implementation is sufficient. However, future versions of {INLAvaan} will re-introduce dependency on R-INLA to allow for more complex latent structures, such as spatial and temporal dependencies.

References

Chiuchiolo, Cristian, Janet Van Niekerk, and Håvard Rue. 2023. “Joint Posterior Inference for Latent Gaussian Models with R-INLA.” Journal of Statistical Computation and Simulation 93 (5): 723–52. https://doi.org/10.1080/00949655.2022.2117813.
Gelman, Andrew, Xiao-Li Meng, and Hal Stern. 1996. “Posterior Predictive Assessment of Model Fitness Via Realized Discrepancies.” Statistica Sinica 6 (4): 733–60. https://www.jstor.org/stable/24306036.
Jamil, Haziq, and Håvard Rue. 2026a. Approximate Bayesian Inference for Structural Equation Models Using Integrated Nested Laplace Approximations. arXiv. https://doi.org/10.48550/arXiv.2603.25690.
Jamil, Haziq, and Håvard Rue. 2026b. Implementation and Workflows for INLA-Based Approximate Bayesian Structural Equation Modelling. arXiv. https://doi.org/10.48550/arXiv.2604.00671.
Martins, Thiago G., Daniel Simpson, Finn Lindgren, and Håvard Rue. 2013. “Bayesian Computing with INLA: New Features.” Computational Statistics & Data Analysis 67 (November): 68–83. https://doi.org/10.1016/j.csda.2013.04.014.
Merkle, Edgar C., Ellen Fitzsimmons, James Uanhoro, and Ben Goodrich. 2021. “Efficient Bayesian Structural Equation Modeling in Stan.” Journal of Statistical Software 100 (November): 1–22. https://doi.org/10.18637/jss.v100.i06.
Merkle, Edgar C., and Yves Rosseel. 2018. blavaan: Bayesian Structural Equation Models via Parameter Expansion.” Journal of Statistical Software 85 (June): 1–30. https://doi.org/10.18637/jss.v085.i04.
Rue, Håvard, Sara Martino, and Nicolas Chopin. 2009. “Approximate Bayesian Inference for Latent Gaussian Models by Using Integrated Nested Laplace Approximations.” Journal of the Royal Statistical Society Series B: Statistical Methodology 71 (2): 319–92. https://doi.org/10.1111/j.1467-9868.2008.00700.x.
Schad, Daniel J., Bruno Nicenboim, Paul-Christian Bürkner, Michael Betancourt, and Shravan Vasishth. 2023. “Workflow Techniques for the Robust Use of Bayes Factors.” Psychological Methods (US) 28 (6): 1404–26. https://doi.org/10.1037/met0000472.
Song, Xin‐Yuan, and Sik‐Yum Lee. 2012. Basic and Advanced Bayesian Structural Equation Modeling: With Applications in the Medical and Behavioral Sciences. 1st ed. Wiley Series in Probability and Statistics. Wiley. https://doi.org/10.1002/9781118358887.
Spiegelhalter, David J, Nicola G Best, Bradley P Carlin, and Angelika Van Der Linde. 2002. “Bayesian Measures of Model Complexity and Fit.” Journal of the Royal Statistical Society Series B: Statistical Methodology 64 (4): 583–639.
Tendeiro, Jorge N., and Henk A. L. Kiers. 2019. “A Review of Issues about Null Hypothesis Bayesian Testing.” Psychological Methods (US) 24 (6): 774–95. https://doi.org/10.1037/met0000221.
Tierney, Luke, Robert E. Kass, and Joseph B. Kadane. 1989. “Fully Exponential Laplace Approximations to Expectations and Variances of Nonpositive Functions.” Journal of the American Statistical Association 84 (407): 710–16. https://doi.org/10.1080/01621459.1989.10478824.