Models¶

class
skfair.linear_model.
DemographicParityClassifier
[source]¶ Bases:
sklearn.base.BaseEstimator
,sklearn.linear_model._base.LinearClassifierMixin
A logistic regression classifier which can be constrained on demographic parity (p% score).
Minimizes the Log loss while constraining the correlation between the specified sensitive_cols and the distance to the decision boundary of the classifier.
Only works for binary classification problems
\[\begin{split}\begin{array}{cl}{\operatorname{minimize}} & \sum_{i=1}^{N} \log p\left(y_{i}  \mathbf{x}_{i}, \boldsymbol{\theta}\right) \\ {\text { subject to }} & {\frac{1}{N} \sum_{i=1}^{N}\left(\mathbf{z}_{i}\overline{\mathbf{z}}\right) d \boldsymbol{\theta}\left(\mathbf{x}_{i}\right) \leq \mathbf{c}} \\ {} & {\frac{1}{N} \sum_{i=1}^{N}\left(\mathbf{z}_{i}\overline{\mathbf{z}}\right) d_{\boldsymbol{\theta}}\left(\mathbf{x}_{i}\right) \geq\mathbf{c}}\end{array}\end{split}\]Source:  M. Zafar et al. (2017), Fairness Constraints: Mechanisms for Fair Classification
 Parameters
covariance_threshold – The maximum allowed covariance between the sensitive attributes and the distance to the decision boundary. If set to None, no fairness constraint is enforced
sensitive_cols – List of sensitive column names(when X is a dataframe) or a list of column indices when X is a numpy array.
C – Inverse of regularization strength; must be a positive float. Like in support vector machines, smaller values specify stronger regularization.
penalty – Used to specify the norm used in the penalization. Expects ‘none’ or ‘l1’
fit_intercept – Specifies if a constant (a.k.a. bias or intercept) should be added to the decision function.
max_iter – Maximum number of iterations taken for the solvers to converge.
train_sensitive_cols – Indicates whether the model should use the sensitive columns in the fit step.
multi_class – The method to use for multiclass predictions
n_jobs – The amount of parallel jobs thata should be used to fit multiclass models

class
skfair.linear_model.
EqualOpportunityClassifier
[source]¶ Bases:
sklearn.base.BaseEstimator
,sklearn.linear_model._base.LinearClassifierMixin
A logistic regression classifier which can be constrained on equal opportunity score.
Minimizes the Log loss while constraining the correlation between the specified sensitive_cols and the distance to the decision boundary of the classifier for those examples that have a y_true of 1.
Only works for binary classification problems
\[\begin{split}\begin{array}{cl}{\operatorname{minimize}} & \sum_{i=1}^{N} \log p\left(y_{i}  \mathbf{x}_{i}, \boldsymbol{\theta}\right) \\ {\text { subject to }} & {\frac{1}{POS} \sum_{i=1}^{POS}\left(\mathbf{z}_{i}\overline{\mathbf{z}}\right) d \boldsymbol{\theta}\left(\mathbf{x}_{i}\right) \leq \mathbf{c}} \\ {} & {\frac{1}{POS} \sum_{i=1}^{POS}\left(\mathbf{z}_{i}\overline{\mathbf{z}}\right) d_{\boldsymbol{\theta}}\left(\mathbf{x}_{i}\right) \geq\mathbf{c}}\end{array}\end{split}\]where POS is the subset of the population where y_true = 1
 Parameters
covariance_threshold – The maximum allowed covariance between the sensitive attributes and the distance to the decision boundary. If set to None, no fairness constraint is enforced
positive_target – The name of the class which is associated with a positive outcome
sensitive_cols – List of sensitive column names(when X is a dataframe) or a list of column indices when X is a numpy array.
C – Inverse of regularization strength; must be a positive float. Like in support vector machines, smaller values specify stronger regularization.
penalty – Used to specify the norm used in the penalization. Expects ‘none’ or ‘l1’
fit_intercept – Specifies if a constant (a.k.a. bias or intercept) should be added to the decision function.
max_iter – Maximum number of iterations taken for the solvers to converge.
train_sensitive_cols – Indicates whether the model should use the sensitive columns in the fit step.
multi_class – The method to use for multiclass predictions
n_jobs – The amount of parallel jobs thata should be used to fit multiclass models