% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/16_PARTITIONS.R
\name{create.partitions}
\alias{create.partitions}
\title{Create partitions (aka nested dummy variables)}
\usage{
create.partitions(db)
}
\arguments{
\item{db}{Data set of risk factors to be converted into partitions.}
}
\value{
The command \code{create.partitions} returns a list of two objects (data frames).\cr
The first object (\code{partitions}), returns the data set with newly created nested dummy variables.\cr
The second object (\code{info}), is the data frame that returns info on partition process.
Set of quality checks are performed and reported if any of them observed. Two of them are of terminal nature
i.e. if observed, risk factor is not processed further (less then two non-missing groups and more than 10 modalities)
while the one provides only info (warning) as usually deviates from the main principles of risk factor processing
(less than 5\% of observations per bin).
}
\description{
\code{create.partitions} performs creation of partitions (aka nested dummy variables).
Using directly into logistic regression, partitions provide insight into difference of log-odds of adjacent risk factor bins (groups).
Adjacent bins are selected based on alphabetic order of analyzed risk factor modalities, therefore it is important
to ensure that modality labels are defined in line with expected monotonicity or any other criterion
that is considered while engineering the risk factors.
}
\examples{
suppressMessages(library(PDtoolkit))
data(loans)
#identify numeric risk factors
num.rf <- sapply(loans, is.numeric)
num.rf <- names(num.rf)[!names(num.rf)\%in\%"Creditability" & num.rf]
#discretized numeric risk factors using ndr.bin from monobin package
loans[, num.rf] <- sapply(num.rf, function(x) 
cum.bin(x = loans[, x], y = loans[, "Creditability"])[[2]])
str(loans)
loans.p <- create.partitions(db = loans[, num.rf])
head(loans.p[["partitions"]])
loans.p[["info"]]
#bring target to partitions
db.p <- cbind.data.frame(Creditability = loans$Creditability, loans.p[[1]])
#prepare risk factors for stepMIV 
db.p[, -1] <- sapply(db.p[, -1], as.character)
#run stepMIV
res <- stepMIV(start.model = Creditability ~ 1, 
   miv.threshold = 0.02, 
   m.ch.p.val = 0.05,
   coding = "dummy",
   db = db.p)
#check output elements
names(res)
#extract the final model
final.model <- res$model
#print coefficients
summary(final.model)$coefficients
}
\references{
Scallan, G. (2011). Class(ic) Scorecards: Selecting Characteristics and Attributes in Logistic Regression,
Edinburgh Credit Scoring Conference.
}
