% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/filters.R
\name{lm_filter}
\alias{lm_filter}
\title{Linear model filter}
\usage{
lm_filter(
  y,
  x,
  force_vars = NULL,
  nfilter = NULL,
  p_cutoff = 0.05,
  rsq_cutoff = NULL,
  rsq_method = "pearson",
  type = c("index", "names", "full"),
  keep_factors = TRUE,
  method = 0L,
  mc.cores = 1
)
}
\arguments{
\item{y}{Numeric or integer response vector}

\item{x}{Matrix of predictors. If \code{x} is a data.frame it will be turned into
a matrix. But note that factors will be reduced to numeric values, but a
full design matrix is not generated, so if factors have 3 or more levels,
it is recommended to convert \code{x} into a design (model) matrix first.}

\item{force_vars}{Vector of column names \code{x} which are incorporated into the
linear model.}

\item{nfilter}{Number of predictors to return. If \code{NULL} all predictors with
p-values < \code{p_cutoff} are returned.}

\item{p_cutoff}{p-value cut-off. P-values are calculated by t-statistic on
the estimated coefficient for the predictor being tested.}

\item{rsq_cutoff}{r^2 cutoff for removing predictors due to collinearity.
Default \code{NULL} means no collinearity filtering. Predictors are ranked based
on AIC from a linear model. If 2 or more predictors are collinear, the
first ranked predictor by AIC is retained, while the other collinear
predictors are removed. See \code{\link[=collinear]{collinear()}}.}

\item{rsq_method}{character string indicating which correlation coefficient
is to be computed. One of "pearson" (default), "kendall", or "spearman".
See \code{\link[=collinear]{collinear()}}.}

\item{type}{Type of vector returned. Default "index" returns indices, "names"
returns predictor names, "full" returns a matrix of p values.}

\item{keep_factors}{Logical affecting factors with 3 or more levels.
Dataframes are coerced to a matrix using \link{data.matrix}. Binary
factors are converted to numeric values 0/1 and analysed as such. If
\code{keep_factors} is \code{TRUE} (the default), factors with 3 or more levels are
not filtered and are retained. If \code{keep_factors} is \code{FALSE}, they are
removed.}

\item{method}{Integer determining linear model method. See
\code{\link[RcppEigen:fastLm]{RcppEigen::fastLmPure()}}}

\item{mc.cores}{Number of cores for parallelisation using
\code{\link[parallel:mclapply]{parallel::mclapply()}}.}
}
\value{
Integer vector of indices of filtered parameters (\code{type = "index"})
or character vector of names (\code{type = "names"}) of filtered parameters in
order of linear model AIC. Any variables in \code{force_vars} which are
incorporated into all models are listed first. If \code{type = "full"} a matrix
of AIC value, sigma (residual standard error, see \link{summary.lm}),
coefficient, t-statistic and p-value for each tested predictor is returned.
}
\description{
Linear models are fitted on each predictor, with inclusion of variable names
listed in \code{force_vars} in the model. Predictors are ranked by Akaike
information criteria (AIC) value, or can be filtered by the p-value on the
estimate of the coefficient for that predictor in its model.
}
\details{
This filter is based on the model \code{y ~ xvar + force_vars} where \code{y} is the
response vector, \code{xvar} are variables in columns taken sequentially from \code{x}
and \code{force_vars} are optional covariates extracted from \code{x}. It uses
\code{\link[RcppEigen:fastLm]{RcppEigen::fastLmPure()}} with \code{method = 0} as default since it is
rank-revealing. \code{method = 3} is significantly faster but can give errors in
estimation of p-value with variables of zero variance. The algorithm attempts
to detect these and set their stats to \code{NA}. \code{NA} in \code{x} are not tolerated.

Parallelisation is available via \code{\link[=mclapply]{mclapply()}}. This is provided mainly for
the use case of the filter being used as standalone. Nesting parallelisation
inside of parallelised \code{\link[=nestcv.glmnet]{nestcv.glmnet()}} or \code{\link[=nestcv.train]{nestcv.train()}} loops is not
recommended.
}
