% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/rbmi.R
\name{rbmi_analyse}
\alias{rbmi_analyse}
\title{Analyse Multiple Imputed Datasets}
\usage{
rbmi_analyse(
  imputations,
  fun = rbmi_ancova,
  delta = NULL,
  ...,
  cluster_or_cores = 1,
  .validate = TRUE
)
}
\arguments{
\item{imputations}{An \code{imputations} object as created by the impute() function from the rbmi package.}

\item{fun}{An analysis function to be applied to each imputed dataset. See details.}

\item{delta}{A \code{data.frame} containing the delta transformation to be applied to the imputed
datasets prior to running \code{fun}. See details.}

\item{...}{Additional arguments passed onto \code{fun}.}

\item{cluster_or_cores}{(\code{numeric} or \verb{cluster object})\cr
The number of parallel processes to use when running this function. Can also be a
cluster object created by \code{\link[=make_rbmi_cluster]{make_rbmi_cluster()}}. See the parallelisation section below.}

\item{.validate}{(\code{logical})\cr
Should \code{imputations} be checked to ensure it conforms to the required format
(default = \code{TRUE}) ? Can gain a small performance increase if this is set to \code{FALSE} when
analysing a large number of samples.}
}
\value{
An \code{analysis} object, as defined by \code{rbmi}, representing the desired
analysis applied to each of the imputed datasets in \code{imputations}.
}
\description{
This function takes multiple imputed datasets (as generated by
the impute() function from the rbmi package) and runs an analysis function on
each of them.
}
\details{
This function works by performing the following steps:
\enumerate{
\item Extract a dataset from the \code{imputations} object.
\item Apply any delta adjustments as specified by the \code{delta} argument.
\item Run the analysis function \code{fun} on the dataset.
\item Repeat steps 1-3 across all of the datasets inside the \code{imputations}
object.
\item Collect and return all of the analysis results.
}

The analysis function \code{fun} must take a \code{data.frame} as its first
argument. All other options to \code{\link[=rbmi_analyse]{rbmi_analyse()}} are passed onto \code{fun}
via \code{...}.
\code{fun} must return a named list with each element itself being a
list containing a single
numeric element called \code{est} (or additionally \code{se} and \code{df} if
you had originally specified the method_bayes() or method_approxbayes() functions from the rbmi package)
i.e.:
\preformatted{
myfun <- function(dat, ...) {
    mod_1 <- lm(data = dat, outcome ~ group)
    mod_2 <- lm(data = dat, outcome ~ group + covar)
    x <- list(
        trt_1 = list(
            est = coef(mod_1)[['group']],  # Use [[ ]] for safety
            se = sqrt(vcov(mod_1)['group', 'group']), # Use ['','']
            df = df.residual(mod_1)
        ),
        trt_2 = list(
            est = coef(mod_2)[['group']],  # Use [[ ]] for safety
            se = sqrt(vcov(mod_2)['group', 'group']), # Use ['','']
            df = df.residual(mod_2)
        )
     )
     return(x)
 }
}

Please note that the \code{vars$subjid} column (as defined in the original call to
the draws() function from the rbmi package) will be scrambled in the data.frames that are provided to \code{fun}.
This is to say they will not contain the original subject values and as such
any hard coding of subject ids is strictly to be avoided.

By default \code{fun} is the \code{\link[=rbmi_ancova]{rbmi_ancova()}} function.
Please note that this function
requires that a \code{vars} object, as created by the set_vars() function from the rbmi package, is provided via
the \code{vars} argument e.g. \code{rbmi_analyse(imputeObj, vars = set_vars(...))}. Please
see the documentation for \code{\link[=rbmi_ancova]{rbmi_ancova()}} for full details.
Please also note that the theoretical justification for the conditional mean imputation
method (\code{method = method_condmean()} in the draws() function from the rbmi package) relies on the fact that ANCOVA is
a linear transformation of the outcomes.
Thus care is required when applying alternative analysis functions in this setting.

The \code{delta} argument can be used to specify offsets to be applied
to the outcome variable in the imputed datasets prior to the analysis.
This is typically used for sensitivity or tipping point analyses. The
delta dataset must contain columns \code{vars$subjid}, \code{vars$visit} (as specified
in the original call to the draws() function from the rbmi package) and \code{delta}. Essentially this \code{data.frame}
is merged onto the imputed dataset by \code{vars$subjid} and \code{vars$visit} and then
the outcome variable is modified by:

\if{html}{\out{<div class="sourceCode">}}\preformatted{imputed_data[[vars$outcome]] <- imputed_data[[vars$outcome]] + imputed_data[['delta']]
}\if{html}{\out{</div>}}

Please note that in order to provide maximum flexibility, the \code{delta} argument
can be used to modify any/all outcome values including those that were not
imputed. Care must be taken when defining offsets. It is recommend that you
use the helper function delta_template() from the rbmi package to define the delta datasets as
this provides utility variables such as \code{is_missing} which can be used to identify
exactly which visits have been imputed.
}
\section{Parallelisation}{

To speed up the evaluation of \code{rbmi_analyse()} you can use the \code{cluster_or_cores} argument to enable parallelisation.
Simply providing an integer will get \code{rbmi} to automatically spawn that many background processes
to parallelise across. If you are using a custom analysis function then you need to ensure
that any libraries or global objects required by your function are available in the
sub-processes. To do this you need to use the \code{\link[=make_rbmi_cluster]{make_rbmi_cluster()}} function for example:

\if{html}{\out{<div class="sourceCode">}}\preformatted{my_custom_fun <- function(...) <some analysis code>
cl <- make_rbmi_cluster(
    4,
    objects = list('my_custom_fun' = my_custom_fun),
    packages = c('dplyr', 'nlme')
)
rbmi_analyse(
    imputations = imputeObj,
    fun = my_custom_fun,
    cluster_or_cores = cl
)
parallel::stopCluster(cl)
}\if{html}{\out{</div>}}

Note that there is significant overhead both with setting up the sub-processes and with
transferring data back-and-forth between the main process and the sub-processes. As such
parallelisation of the \code{rbmi_analyse()} function tends to only be worth it when you have
\verb{> 2000} samples generated by the draws() function from the rbmi package.
Conversely using parallelisation if your samples
are smaller than this may lead to longer run times than just running it sequentially.

It is important to note that the implementation of parallel processing within the analyse()
function from the rbmi package has been optimised around the assumption that the parallel
processes will be spawned on the same machine and not a remote cluster.
One such optimisation is that the required data is saved to
a temporary file on the local disk from which it is then read into each sub-process. This is
done to avoid the overhead of transferring the data over the network. Our assumption is that
if you are at the stage where you need to be parallelising your analysis over a remote cluster
then you would likely be better off parallelising across multiple \code{rbmi} runs rather than within
a single \code{rbmi} run.

Finally, if you are doing a tipping point analysis you can get a reasonable performance
improvement by re-using the cluster between each call to \code{rbmi_analyse()} e.g.

\if{html}{\out{<div class="sourceCode">}}\preformatted{cl <- make_rbmi_cluster(4)
ana_1 <- rbmi_analyse(
    imputations = imputeObj,
    delta = delta_plan_1,
    cluster_or_cores = cl
)
ana_2 <- rbmi_analyse(
    imputations = imputeObj,
    delta = delta_plan_2,
    cluster_or_cores = cl
)
ana_3 <- rbmi_analyse(
    imputations = imputeObj,
    delta = delta_plan_3,
    cluster_or_cores = cl
)
parallel::clusterStop(cl)
}\if{html}{\out{</div>}}
}

\examples{

 library(rbmi)
 library(dplyr)

 dat <- antidepressant_data
 dat$GENDER <- as.factor(dat$GENDER)
 dat$POOLINV <- as.factor(dat$POOLINV)
 set.seed(123)
 pat_ids <- sample(levels(dat$PATIENT), nlevels(dat$PATIENT) / 4)
 dat <- dat |>
   filter(PATIENT \%in\% pat_ids) |>
   droplevels()
 dat <- expand_locf(
   dat,
   PATIENT = levels(dat$PATIENT),
   VISIT = levels(dat$VISIT),
   vars = c("BASVAL", "THERAPY"),
   group = c("PATIENT"),
   order = c("PATIENT", "VISIT")
 )
 dat_ice <- dat |>
   arrange(PATIENT, VISIT) |>
   filter(is.na(CHANGE)) |>
   group_by(PATIENT) |>
   slice(1) |>
   ungroup() |>
   select(PATIENT, VISIT) |>
   mutate(strategy = "JR")
 dat_ice <- dat_ice[-which(dat_ice$PATIENT == 3618), ]
 vars <- set_vars(
   outcome = "CHANGE",
   visit = "VISIT",
   subjid = "PATIENT",
   group = "THERAPY",
   covariates = c("THERAPY")
 )
 drawObj <- draws(
   data = dat,
   data_ice = dat_ice,
   vars = vars,
   method = method_condmean(type = "jackknife", covariance = "csh"),
   quiet = TRUE
 )
 references <- c("DRUG" = "PLACEBO", "PLACEBO" = "PLACEBO")
 imputeObj <- impute(drawObj, references)

 rbmi_analyse(imputations = imputeObj, vars = vars)

}
\seealso{
The extract_imputed_dfs() function from the rbmi package for manually extracting imputed
datasets.

The delta_template() function from the rbmi package for creating delta data.frames.

\code{\link[=rbmi_ancova]{rbmi_ancova()}} for the default analysis function.
}
