# Preprocessing¶

class skfair.preprocessing.InformationFilter(columns, alpha=1)[source]

Bases: sklearn.base.BaseEstimator, sklearn.base.TransformerMixin

The InformationFilter uses a variant of the gram smidt process to filter information out of the dataset. This can be useful if you want to filter information out of a dataset because of fairness. To explain how it works: given a training matrix $$X$$ that contains columns $$x_1, ..., x_k$$. If we assume columns $$x_1$$ and $$x_2$$ to be the sensitive columns then the information-filter will remove information by applying these transformations;

$\begin{split}\begin{split} v_1 & = x_1 \\ v_2 & = x_2 - \frac{x_2 v_1}{v_1 v_1}\\ v_3 & = x_3 - \frac{x_k v_1}{v_1 v_1} - \frac{x_2 v_2}{v_2 v_2}\\ ... \\ v_k & = x_k - \frac{x_k v_1}{v_1 v_1} - \frac{x_2 v_2}{v_2 v_2} \end{split}\end{split}$

Concatenating our vectors (but removing the sensitive ones) gives us a new training matrix $$X_{fair} = [v_3, ..., v_k]$$.

Parameters
• columns – the columns to filter out this can be a sequence of either int (in the case of numpy) or string (in the case of pandas).

• alpha – parameter to control how much to filter, for alpha=1 we filter out all information while for alpha=0 we don’t apply any.

fit(X, y=None)[source]

Learn the projection required to make the dataset orthogonal to sensitive columns.

transform(X)[source]

Transforms X by applying the information filter.