How to Detect and Mitigate Bias in ML Models

Measure Bias First, Then Fix It

You can’t fix what you can’t measure. Before touching your model, you need numbers that show exactly where and how much bias exists. Fairlearn gives you those numbers through MetricFrame, and then gives you algorithms to reduce the disparity.

Install it alongside scikit-learn:

1
pip install fairlearn==0.13.0 scikit-learn pandas matplotlib

Here’s a complete bias audit on the Adult Census dataset. This dataset predicts whether someone earns over $50K/year, and it’s notorious for encoding historical discrimination around sex and race.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score, balanced_accuracy_score

from fairlearn.datasets import fetch_adult
from fairlearn.metrics import (
    MetricFrame,
    selection_rate,
    demographic_parity_difference,
    demographic_parity_ratio,
    equalized_odds_difference,
    false_positive_rate,
    true_positive_rate,
    count,
)

# Load and prepare data
data = fetch_adult()
X_raw = data.data
y = (data.target == ">50K") * 1

# Sex is our sensitive feature
A = X_raw["sex"]
X = pd.get_dummies(X_raw.drop(columns=["sex"]))

sc = StandardScaler()
X_scaled = pd.DataFrame(sc.fit_transform(X), columns=X.columns)

X_train, X_test, y_train, y_test, A_train, A_test = train_test_split(
    X_scaled, y, A, test_size=0.3, random_state=42, stratify=y
)

# Reset indices to avoid alignment issues
X_train = X_train.reset_index(drop=True)
X_test = X_test.reset_index(drop=True)
y_train = y_train.reset_index(drop=True)
y_test = y_test.reset_index(drop=True)
A_train = A_train.reset_index(drop=True)
A_test = A_test.reset_index(drop=True)

# Train a baseline model
model = LogisticRegression(solver="liblinear", max_iter=1000)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

# Audit fairness
metrics = {
    "accuracy": accuracy_score,
    "selection_rate": selection_rate,
    "true_positive_rate": true_positive_rate,
    "false_positive_rate": false_positive_rate,
    "count": count,
}

mf = MetricFrame(
    metrics=metrics,
    y_true=y_test,
    y_pred=y_pred,
    sensitive_features=A_test,
)

print("=== Overall Metrics ===")
print(mf.overall)
print("\n=== Metrics by Group ===")
print(mf.by_group)
print("\n=== Group Differences ===")
print(mf.difference())

When you run this, you’ll see something like an 85% overall accuracy but a selection rate of around 30% for men and 10% for women. That gap is the problem. The model learned the historical pattern that men earn more, and it’s reproducing that inequality.

Understanding the Key Fairness Metrics

Not all fairness metrics are equal, and you need to pick the right one for your use case. Here’s what matters.

Demographic parity difference measures whether the model selects (predicts positive) at equal rates across groups. A value of 0 means perfect parity. Anything above 0.1 is a red flag.

1
2
3
4
5
6
7
dpd = demographic_parity_difference(y_test, y_pred, sensitive_features=A_test)
dpr = demographic_parity_ratio(y_test, y_pred, sensitive_features=A_test)
eod = equalized_odds_difference(y_test, y_pred, sensitive_features=A_test)

print(f"Demographic parity difference: {dpd:.4f}")
print(f"Demographic parity ratio:      {dpr:.4f}")
print(f"Equalized odds difference:     {eod:.4f}")

Demographic parity ratio is the ratio of the lowest group selection rate to the highest. The legal standard from disparate impact doctrine says this should be above 0.8 (the “four-fifths rule”). Below that, you’re in trouble.

Equalized odds difference checks whether the model’s true positive rate and false positive rate are equal across groups. This is stricter than demographic parity because it accounts for actual outcomes. If you’re building a lending model, equalized odds is what you want. It ensures the model is equally accurate for all groups, not just equally likely to say “yes.”

My recommendation: start with demographic parity ratio as your primary metric. It’s easy to explain to stakeholders, it maps to legal standards, and it catches the most common forms of bias. Use equalized odds as a secondary check when you need error-rate fairness.

Visualizing Bias with MetricFrame

Numbers are good. Charts are better for getting buy-in from stakeholders.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
import matplotlib.pyplot as plt

fig, axes = plt.subplots(1, 2, figsize=(12, 5))

# Selection rate by group
mf_plot = MetricFrame(
    metrics={"selection_rate": selection_rate},
    y_true=y_test,
    y_pred=y_pred,
    sensitive_features=A_test,
)
mf_plot.by_group.plot.bar(ax=axes[0], legend=False, color=["#2d6a4f", "#40916c"])
axes[0].set_title("Selection Rate by Sex")
axes[0].set_ylabel("Selection Rate")
axes[0].set_xlabel("")
axes[0].tick_params(axis="x", rotation=0)

# True positive rate by group
mf_tpr = MetricFrame(
    metrics={"true_positive_rate": true_positive_rate},
    y_true=y_test,
    y_pred=y_pred,
    sensitive_features=A_test,
)
mf_tpr.by_group.plot.bar(ax=axes[1], legend=False, color=["#2d6a4f", "#40916c"])
axes[1].set_title("True Positive Rate by Sex")
axes[1].set_ylabel("TPR")
axes[1].set_xlabel("")
axes[1].tick_params(axis="x", rotation=0)

plt.tight_layout()
plt.savefig("fairness_audit.png", dpi=150)
plt.show()

This gives you a clear side-by-side comparison. When the bars are visibly different heights, you have a bias problem worth addressing.

Mitigating Bias with ThresholdOptimizer

ThresholdOptimizer is the fastest way to reduce bias without retraining your model. It adjusts the classification threshold per group so that a fairness constraint is satisfied. The trade-off: you lose some overall accuracy to gain fairness.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
from fairlearn.postprocessing import ThresholdOptimizer

# Wrap the existing model with ThresholdOptimizer
to = ThresholdOptimizer(
    estimator=model,
    constraints="demographic_parity",
    objective="balanced_accuracy_score",
    prefit=True,
    predict_method="predict_proba",
)

to.fit(X_train, y_train, sensitive_features=A_train)
y_pred_to = to.predict(X_test, sensitive_features=A_test)

# Compare before and after
print("=== Before Mitigation ===")
print(f"  Accuracy:          {accuracy_score(y_test, y_pred):.4f}")
print(f"  DP difference:     {demographic_parity_difference(y_test, y_pred, sensitive_features=A_test):.4f}")
print(f"  DP ratio:          {demographic_parity_ratio(y_test, y_pred, sensitive_features=A_test):.4f}")

print("\n=== After ThresholdOptimizer ===")
print(f"  Accuracy:          {accuracy_score(y_test, y_pred_to):.4f}")
print(f"  DP difference:     {demographic_parity_difference(y_test, y_pred_to, sensitive_features=A_test):.4f}")
print(f"  DP ratio:          {demographic_parity_ratio(y_test, y_pred_to, sensitive_features=A_test):.4f}")

You’ll typically see accuracy drop by 1-3 percentage points while the demographic parity ratio jumps from 0.3 to above 0.8. That’s the trade-off, and in most cases it’s worth it.

ThresholdOptimizer works well when you just need to deploy a fair version of an existing model fast. It’s a post-processing step, so you don’t need to retrain anything.

Mitigating Bias with ExponentiatedGradient

For a more principled approach, ExponentiatedGradient retrains the model under a fairness constraint. It produces a randomized classifier that optimizes accuracy while keeping disparity within bounds you specify.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
from fairlearn.reductions import (
    ExponentiatedGradient,
    DemographicParity,
    EqualizedOdds,
    ErrorRate,
)

# Define constraint and objective
constraint = DemographicParity(difference_bound=0.01)
classifier = LogisticRegression(solver="liblinear", max_iter=1000)

mitigator = ExponentiatedGradient(
    estimator=classifier,
    constraints=constraint,
)

mitigator.fit(X_train, y_train, sensitive_features=A_train)
y_pred_eg = mitigator.predict(X_test)

print("=== After ExponentiatedGradient ===")
print(f"  Accuracy:          {accuracy_score(y_test, y_pred_eg):.4f}")
print(f"  DP difference:     {demographic_parity_difference(y_test, y_pred_eg, sensitive_features=A_test):.4f}")
print(f"  DP ratio:          {demographic_parity_ratio(y_test, y_pred_eg, sensitive_features=A_test):.4f}")
print(f"  EO difference:     {equalized_odds_difference(y_test, y_pred_eg, sensitive_features=A_test):.4f}")

ExponentiatedGradient usually achieves better accuracy-fairness trade-offs than ThresholdOptimizer because it actually learns a fair model instead of patching the output. Use it when you can afford to retrain.

You can also swap DemographicParity for EqualizedOdds if you need error-rate parity:

1
2
3
4
5
6
constraint_eo = EqualizedOdds(difference_bound=0.02)
mitigator_eo = ExponentiatedGradient(
    estimator=classifier,
    constraints=constraint_eo,
)
mitigator_eo.fit(X_train, y_train, sensitive_features=A_train)

GridSearch for Exploring the Fairness-Accuracy Trade-Off

Sometimes you want to see the full Pareto frontier of models: what accuracy can you get at various levels of fairness? GridSearch trains multiple models across a range of constraints and lets you pick the one that fits your requirements.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
from fairlearn.reductions import GridSearch

sweep = GridSearch(
    LogisticRegression(solver="liblinear", max_iter=1000),
    constraints=DemographicParity(),
    grid_size=31,
)

sweep.fit(X_train, y_train, sensitive_features=A_train)

# Evaluate all models
results = []
for i, predictor in enumerate(sweep.predictors_):
    preds = predictor.predict(X_test)
    results.append({
        "model": i,
        "accuracy": accuracy_score(y_test, preds),
        "dp_diff": demographic_parity_difference(y_test, preds, sensitive_features=A_test),
        "dp_ratio": demographic_parity_ratio(y_test, preds, sensitive_features=A_test),
    })

results_df = pd.DataFrame(results)

# Find the best model with DP ratio >= 0.8
fair_models = results_df[results_df["dp_ratio"] >= 0.8]
if not fair_models.empty:
    best = fair_models.loc[fair_models["accuracy"].idxmax()]
    print(f"Best fair model: #{int(best['model'])}")
    print(f"  Accuracy:  {best['accuracy']:.4f}")
    print(f"  DP ratio:  {best['dp_ratio']:.4f}")
    print(f"  DP diff:   {best['dp_diff']:.4f}")

GridSearch is excellent for stakeholder conversations. You can show them the exact accuracy cost of different fairness thresholds and let them decide what’s acceptable.

Integrating Fairness Checks into Your ML Pipeline

Fairness checks should run automatically, just like unit tests. Add them to your training pipeline so bias doesn’t silently creep back in when you retrain.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
def fairness_gate(y_true, y_pred, sensitive_features, thresholds=None):
    """Check fairness metrics against thresholds. Returns True if all pass."""
    if thresholds is None:
        thresholds = {
            "dp_ratio_min": 0.8,
            "dp_diff_max": 0.1,
            "eo_diff_max": 0.1,
        }

    dp_ratio = demographic_parity_ratio(y_true, y_pred, sensitive_features=sensitive_features)
    dp_diff = demographic_parity_difference(y_true, y_pred, sensitive_features=sensitive_features)
    eo_diff = equalized_odds_difference(y_true, y_pred, sensitive_features=sensitive_features)

    results = {
        "dp_ratio": {"value": dp_ratio, "threshold": thresholds["dp_ratio_min"], "pass": dp_ratio >= thresholds["dp_ratio_min"]},
        "dp_diff": {"value": dp_diff, "threshold": thresholds["dp_diff_max"], "pass": dp_diff <= thresholds["dp_diff_max"]},
        "eo_diff": {"value": eo_diff, "threshold": thresholds["eo_diff_max"], "pass": eo_diff <= thresholds["eo_diff_max"]},
    }

    all_passed = all(r["pass"] for r in results.values())

    for name, r in results.items():
        status = "PASS" if r["pass"] else "FAIL"
        print(f"  [{status}] {name}: {r['value']:.4f} (threshold: {r['threshold']})")

    return all_passed


# Use in your pipeline
print("=== Unmitigated Model ===")
passed = fairness_gate(y_test, y_pred, A_test)
print(f"  Gate passed: {passed}")

print("\n=== Mitigated Model ===")
passed = fairness_gate(y_test, y_pred_eg, A_test)
print(f"  Gate passed: {passed}")

Wire this into your CI/CD pipeline. If the fairness gate fails, block the deployment. Models that pass accuracy checks but fail fairness checks shouldn’t go to production.

Common Errors and Fixes

ModuleNotFoundError: No module named 'fairlearn.datasets' – This happens when you install an older version of Fairlearn. The datasets module was added in 0.7.0. Upgrade to the latest:

1
pip install --upgrade fairlearn

ImportError: cannot import name 'MetricFrame' from 'fairlearn.metrics' – You’re on Fairlearn 0.4.x or 0.5.x, which used group_summary instead of MetricFrame. This API was removed in 0.7.0. Upgrade:

1
pip install "fairlearn>=0.13.0"

ValueError: sensitive_features and y are not the same length – Your indices are misaligned after train_test_split. Always reset indices on all arrays after splitting:

1
2
3
X_train = X_train.reset_index(drop=True)
A_train = A_train.reset_index(drop=True)
y_train = y_train.reset_index(drop=True)

This is the most common mistake people make with Fairlearn. The library uses index alignment, so if your DataFrame indices don’t match, you get silent misalignment or this error.

sklearn.exceptions.NotFittedError when using ThresholdOptimizer with prefit=True – You passed prefit=True but the estimator isn’t actually fitted. Either fit your model first or set prefit=False to let ThresholdOptimizer fit it internally:

1
2
3
4
5
6
7
# Option 1: fit first, then pass prefit=True
model.fit(X_train, y_train)
to = ThresholdOptimizer(estimator=model, constraints="demographic_parity", prefit=True)

# Option 2: let ThresholdOptimizer fit it
to = ThresholdOptimizer(estimator=model, constraints="demographic_parity", prefit=False)
to.fit(X_train, y_train, sensitive_features=A_train)

TypeError: predict_proba is not available in ThresholdOptimizer – Your estimator doesn’t support predict_proba. Either switch to an estimator that does (like LogisticRegression) or set predict_method="decision_function":

1
2
3
4
5
6
to = ThresholdOptimizer(
    estimator=svm_model,
    constraints="demographic_parity",
    predict_method="decision_function",
    prefit=True,
)

ExponentiatedGradient converges very slowly or doesn’t converge – Reduce grid_size and increase max_iter in the base estimator. Also try relaxing the constraint bound:

1
2
3
4
5
6
constraint = DemographicParity(difference_bound=0.05)  # Relaxed from 0.01
mitigator = ExponentiatedGradient(
    estimator=LogisticRegression(max_iter=5000),
    constraints=constraint,
    max_iter=50,
)

Which Mitigation Method to Choose

Here’s my take, based on deploying these in production:

ThresholdOptimizer when you already have a trained model and need to make it fair without retraining. Fast, simple, works as a post-processing step. Downside: you need sensitive features at prediction time, which you may not always have.

ExponentiatedGradient when you’re training from scratch and want the best accuracy-fairness trade-off. It’s the most principled approach and usually produces better results than ThresholdOptimizer. Use it as your default.

GridSearch when you need to present options to stakeholders or explore the full trade-off space. Great for regulatory compliance discussions where you need to show you evaluated multiple configurations.

Start with ExponentiatedGradient. If it’s too slow or you can’t retrain, fall back to ThresholdOptimizer. Use GridSearch for exploration and reporting.

Measure Bias First, Then Fix It#

Understanding the Key Fairness Metrics#

Visualizing Bias with MetricFrame#

Mitigating Bias with ThresholdOptimizer#

Mitigating Bias with ExponentiatedGradient#

GridSearch for Exploring the Fairness-Accuracy Trade-Off#

Integrating Fairness Checks into Your ML Pipeline#

Common Errors and Fixes#

Which Mitigation Method to Choose#

Related Guides#

About the Author