Models

class skfair.linear_model.DemographicParityClassifier[source]

Bases: sklearn.base.BaseEstimator, sklearn.linear_model._base.LinearClassifierMixin

A logistic regression classifier which can be constrained on demographic parity (p% score).

Minimizes the Log loss while constraining the correlation between the specified sensitive_cols and the distance to the decision boundary of the classifier.

Only works for binary classification problems

\[\begin{split}\begin{array}{cl}{\operatorname{minimize}} & -\sum_{i=1}^{N} \log p\left(y_{i} | \mathbf{x}_{i}, \boldsymbol{\theta}\right) \\ {\text { subject to }} & {\frac{1}{N} \sum_{i=1}^{N}\left(\mathbf{z}_{i}-\overline{\mathbf{z}}\right) d \boldsymbol{\theta}\left(\mathbf{x}_{i}\right) \leq \mathbf{c}} \\ {} & {\frac{1}{N} \sum_{i=1}^{N}\left(\mathbf{z}_{i}-\overline{\mathbf{z}}\right) d_{\boldsymbol{\theta}}\left(\mathbf{x}_{i}\right) \geq-\mathbf{c}}\end{array}\end{split}\]

Source: - M. Zafar et al. (2017), Fairness Constraints: Mechanisms for Fair Classification

Parameters
  • covariance_threshold – The maximum allowed covariance between the sensitive attributes and the distance to the decision boundary. If set to None, no fairness constraint is enforced

  • sensitive_cols – List of sensitive column names(when X is a dataframe) or a list of column indices when X is a numpy array.

  • C – Inverse of regularization strength; must be a positive float. Like in support vector machines, smaller values specify stronger regularization.

  • penalty – Used to specify the norm used in the penalization. Expects ‘none’ or ‘l1’

  • fit_intercept – Specifies if a constant (a.k.a. bias or intercept) should be added to the decision function.

  • max_iter – Maximum number of iterations taken for the solvers to converge.

  • train_sensitive_cols – Indicates whether the model should use the sensitive columns in the fit step.

  • multi_class – The method to use for multiclass predictions

  • n_jobs – The amount of parallel jobs thata should be used to fit multiclass models

class skfair.linear_model.EqualOpportunityClassifier[source]

Bases: sklearn.base.BaseEstimator, sklearn.linear_model._base.LinearClassifierMixin

A logistic regression classifier which can be constrained on equal opportunity score.

Minimizes the Log loss while constraining the correlation between the specified sensitive_cols and the distance to the decision boundary of the classifier for those examples that have a y_true of 1.

Only works for binary classification problems

\[\begin{split}\begin{array}{cl}{\operatorname{minimize}} & -\sum_{i=1}^{N} \log p\left(y_{i} | \mathbf{x}_{i}, \boldsymbol{\theta}\right) \\ {\text { subject to }} & {\frac{1}{POS} \sum_{i=1}^{POS}\left(\mathbf{z}_{i}-\overline{\mathbf{z}}\right) d \boldsymbol{\theta}\left(\mathbf{x}_{i}\right) \leq \mathbf{c}} \\ {} & {\frac{1}{POS} \sum_{i=1}^{POS}\left(\mathbf{z}_{i}-\overline{\mathbf{z}}\right) d_{\boldsymbol{\theta}}\left(\mathbf{x}_{i}\right) \geq-\mathbf{c}}\end{array}\end{split}\]

where POS is the subset of the population where y_true = 1

Parameters
  • covariance_threshold – The maximum allowed covariance between the sensitive attributes and the distance to the decision boundary. If set to None, no fairness constraint is enforced

  • positive_target – The name of the class which is associated with a positive outcome

  • sensitive_cols – List of sensitive column names(when X is a dataframe) or a list of column indices when X is a numpy array.

  • C – Inverse of regularization strength; must be a positive float. Like in support vector machines, smaller values specify stronger regularization.

  • penalty – Used to specify the norm used in the penalization. Expects ‘none’ or ‘l1’

  • fit_intercept – Specifies if a constant (a.k.a. bias or intercept) should be added to the decision function.

  • max_iter – Maximum number of iterations taken for the solvers to converge.

  • train_sensitive_cols – Indicates whether the model should use the sensitive columns in the fit step.

  • multi_class – The method to use for multiclass predictions

  • n_jobs – The amount of parallel jobs thata should be used to fit multiclass models