% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/adult.R
\docType{data}
\name{adult}
\alias{adult}
\alias{adult_test}
\alias{adult_train}
\title{Adult Dataset}
\source{
Dua, Dheeru, Graff, Casey (2017).
\dQuote{UCI Machine Learning Repository.}
\url{http://archive.ics.uci.edu/ml/}.
Ding, Frances, Hardt, Moritz, Miller, John, Schmidt, Ludwig (2021).
\dQuote{Retiring adult: New datasets for fair machine learning.}
In \emph{Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 1)}.
}
\description{
Dataset used to predict whether income exceeds $50K/yr based on census data.
Also known as "Census Income" dataset
Train dataset contains 13 features and 30178 observations.
Test dataset contains 13 features and 15315 observations.
Target column is "target": A binary factor where 1: <=50K and 2: >50K for annual income.
The column \code{"sex"} is set as protected attribute.
}
\section{Derived tasks}{

\itemize{
\item \code{adult_train}: Original train split for the adult task available at UCI.
\item \code{adult_test}: Original test split for the adult task available at UCI.
}
}

\section{Using Adult - Known Problems}{

The adult dataset has several known limitations such as its age, limited documentation, and outdated feature encodings (Ding et al., 2021).
Furthermore, the selected threshold (income <=50K) has strong implications on the outcome of analysis, such that
"In many cases, the $50k threshold understates and misrepresents the broader picture" (Ding et al., 2021).
As a result, conclusions w.r.t. real-world implications are severely limited.

We decide to replicate the dataset here, as it is a widely used benchmark dataset and it can still serve this purpose.
}

\section{Pre-processing}{

\itemize{
\item \code{fnlwgt} Remove final weight, which is the number of people the census believes the entry represents
\item \code{native-country} Remove Native Country, which is the country of origin for an individual
\item Rows containing \code{NA} in workclass and occupation have been removed.
\item Pre-processing inspired by article: @url https://cseweb.ucsd.edu//classes/sp15/cse190-c/reports/sp15/048.pdf
}
}

\section{Metadata}{

\itemize{
\item (integer) age: The age of the individuals
\item (factor) workclass: A general term to represent the employment status of an individual
\item (factor) education: The highest level of education achieved by an individual.
\item (integer) education_num: the highest level of education achieved in numerical form.
\item (factor) marital_status: marital status of an individual.
\item (factor) occupation: the general type of occupation of an individual
\item (factor) relationship: whether the individual is in a relationship-
\item (factor) race: Descriptions of an individual’s race
\item (factor) sex: the biological sex of the individual
\item (integer) captain-gain: capital gains for an individual
\item (integer) captain-loss: capital loss for an individual
\item (integer) hours-per-week: the hours an individual has reported to work per week
\item (factor) target: whether or not an individual makes more than $50,000 annually
}
}

\examples{
library("mlr3")
data("adult_test", package = "mlr3fairness")
data("adult_train", package = "mlr3fairness")
}
\keyword{data}
