% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/drglm.R
\name{drglm}
\alias{drglm}
\title{Fitting Linear and Generalized Linear Model in "Divide and Recombine" approach to Large Data Sets}
\usage{
drglm(formula, family, data, k, fitfunction)
}
\arguments{
\item{formula}{An entity belonging to the "formula" class (or one that can be transformed into that class) represents a symbolic representation of the model that needs to be adjusted. Specifics about how the model is defined can be found in the 'Details' section.}

\item{family}{An explanation of the error distribution that will be implemented in the model.}

\item{data}{A data frame, list, or environment that is not required but can be provided if available.}

\item{k}{Number of subsets to be used.}

\item{fitfunction}{The function to be utilized for model fitting. \code{glm} or \code{speedglm} should be used.For Multinomial models, \code{multinom} function is preferred.}
}
\value{
A Generalized Linear Model is fitted in "Divide & Recombine" approach using "k" chunks to data set. A list of model coefficients is estimated using divide and recombine method with the respective standard error of estimates.
}
\description{
Function \code{drglm} aimed to fit GLMs to datasets larger in size that can be stored in memory. It uses popular divide and recombine technique to handle large data sets efficiently.Function \code{drglm} optimizes performance when linked with optimized BLAS libraries like ATLAS.The function \code{drglm} requires defining the number of chunks K and the fitfunction.The rest of the arguments are almost identical with the speedglm or biglm package.
}
\examples{
set.seed(123)
#Number of rows to be generated
n <- 10000
#creating dataset
dataset <- data.frame( pred_1 = round(rnorm(n, mean = 50, sd = 10)),
pred_2 = round(rnorm(n, mean = 7.5, sd = 2.1)),
pred_3 = as.factor(sample(c("0", "1"), n, replace = TRUE)),
pred_4 = as.factor(sample(c("0", "1", "2"), n, replace = TRUE)),
pred_5 = as.factor(sample(0:15, n, replace = TRUE)),
pred_6 = round(rnorm(n, mean = 60, sd = 5)))
#fitting MLRM
nmodel= drglm::drglm(pred_1 ~ pred_2+ pred_3+ pred_4+ pred_5+ pred_6,
data=dataset, family="gaussian", fitfunction="speedglm", k=10)
#Output
nmodel
#fitting simple logistic regression model
bmodel=drglm::drglm(pred_3~ pred_1+ pred_2+ pred_4+ pred_5+ pred_6,
data=dataset, family="binomial", fitfunction="speedglm", k=10)
#Output
bmodel
#fitting poisson regression model
pmodel=drglm::drglm(pred_5~ pred_1+ pred_2+ pred_3+ pred_4+ pred_6,
data=dataset, family="binomial", fitfunction="speedglm", k=10)
#Output
pmodel
#fitting multinomial logistic regression model
mmodel=drglm::drglm(pred_4~ pred_1+ pred_2+ pred_3+ pred_5+ pred_6,
data=dataset, family="multinomial", fitfunction="multinom", k=10)
#Output
mmodel
}
\references{
\itemize{
\item Xi, R., Lin, N., & Chen, Y. (2009). Compression and aggregation for logistic regression analysis in data cubes. IEEE Transactions on Knowledge and Data Engineering, 21(4).
\item Chen, Y., Dong, G., Han, J., Pei, J., Wah, B. W., & Wang, J. (2006). Regression cubes with lossless compression and aggregation. IEEE Transactions on Knowledge and Data Engineering, 18(12).
\item Zuo, W., & Li, Y. (2018). A New Stochastic Restricted Liu Estimator for the Logistic Regression Model. Open Journal of Statistics, 08(01).
\item Karim, M. R., & Islam, M. A. (2019). Reliability and Survival Analysis. In Reliability and Survival Analysis.
\item Enea, M. (2009) Fitting Linear Models and Generalized Linear Models with large data sets in R.
\item Bates, D. (2009) Technical Report on Least Square Calculations.
\item Lumley, T. (2009) \emph{biglm} package documentation.
}
}
\seealso{
\code{\link{big.drglm}}, \code{\link{drglm.multinom}}
}
\author{
MH Nayem
}
