The Bias Report in Action

Using a clean version of the COMPAS dataset, we demostrate the use of The Bias Report web app. Click below for background on the dataset, a description of the process, and analysis.

Details about the report

Background

In 2016, Propublica reported on racial inequality in COMPAS, a risk assessment tool. They showed the algorithm led to unfair disparities in False Negative and False Positive Rates. In particular, they showed black defendants who would not go on to recidivate faced disproportionately high risk scores, while white defendants who would recidivate received disproportionately low risk scores. Northpointe, the company responsible for the algorithm, responded by arguing they callibrated the algorithm to be fair in terms of False Discovery Rate, also known as calibration. With the Bias Report, we get metrics on each type of disparity, adding clarity to the bias auditing process.

The Process

Upload data

First, we upload the data. The cleaned dataset is available on the upload page and follows the format described here.

Select Protected Groups

Following the Propublica-Northpointe debate we focus on race. We select a custom reference group and use Caucasian as the reference group. Our metrics will thus reflect fairness in relation to the historically dominant group.

Select Fairness Metrics

Again following the debate, we select False Positive Rates, False Negative Rates and False Discovery Rates.

Choose threshold

We stick with the default value of 80 percent. This means that any group metric that is between 80 and 125 percent of the reference group metric is considered fair and any metric outside that range is considered unfair.

Analysis

Scrolling through the Bias Report, we the African-American false discovery rates are within the bounds of fairness. This result is expected because COMPAS is calibrated. (The overall FDR fairness returns false, because Asian and Native American defendants did not fall within the fairness threshholds for FDR). On the other hand, African-Americans are roughly twice as likely to have false positives and 40 percent less likely to false negatives. In real terms, 44.8% of African-Americans who did not recidivate were marked high or medium risk (with potential for associated penalties), compared with 23.4% of Caucasian non-reoffenders. This is unfair and is marked False below. These findings mark an inherent trade-off between FPR Fairness, FNR Fairness and calibration, which is present in any decision system where base rates are not equal. See Chouldechova (2017). Aequitas helps bring this trade-off to the forefront with clear metrics and asks system designers to make a reasoned decision based on their use case.

Hide

The Bias Report

Audit Date:	04 Jun 2018
Data Audited:	7214 rows
Attributes Audited:	race
Audit Goal(s):	False Positive Rate Parity - Ensure all protected groups have the same false positive rates as the reference group).
	False Discovery Rate Parity - Ensure all protected groups have equally proportional false positives within the selected set (compared to the reference group).
	False Negative Rate Parity - Ensure all protected groups have the same false negative rates (as the reference group).
Reference Groups:	Custom group - The reference groups you selected for each attribute will be used to calculate relative disparities in this audit.
Fairness Threshold:	80%. If disparity for a group is within 80% and 125% of the value of the reference group on a group metric (e.g. False Positive Rate), this audit will pass.

Audit Results: Summary

False Positive Rate Parity - Ensure all protected groups have the same false positive rates as the reference group).	Failed	Details
False Discovery Rate Parity - Ensure all protected groups have equally proportional false positives within the selected set (compared to the reference group).	Failed	Details
False Negative Rate Parity - Ensure all protected groups have the same false negative rates (as the reference group).	Failed	Details

Audit Results: Details by Fairness Measures

False Positive Rate Parity: Failed

What is it?	When does it matter?	Which groups failed the audit:
This criteria considers an attribute to have False Positive parity if every group has the same False Positive Error Rate. For example, if race has false positive parity, it implies that all three races have the same False Positive Error Rate.	If your desired outcome is to make false positive errors equally on people from all races, then you care about this criteria. This is important in cases where your intervention is punitive and has a risk of adverse outcomes for individuals. Using this criteria allows you to make sure that you are not making false positive mistakes about any single group disproportionately.	For race (with reference group as Caucasian) Asian with 0.37X Disparity African-American with 1.91X Disparity Native American with 1.60X Disparity Other with 0.63X Disparity

Go to Top

False Discovery Rate Parity: Failed

What is it?	When does it matter?	Which groups failed the audit:
This criteria considers an attribute to have False Discovery Rate parity if every group has the same False Discovery Error Rate. For example, if race has false discovery parity, it implies that all three races have the same False Discvery Error Rate.	If your desired outcome is to make false positive errors equally on people from all races, then you care about this criteria. This is important in cases where your intervention is punitive and can hurt individuals and where you are selecting a very small group for interventions.	For race (with reference group as Caucasian) Native American with 0.61X Disparity Asian with 0.61X Disparity

Go to Top

False Negative Rate Parity: Failed

What is it?	When does it matter?	Which groups failed the audit:
This criteria considers an attribute to have False Negative parity if every group has the same False Negative Error Rate. For example, if race has false negative parity, it implies that all three races have the same False Negative Error Rate.	If your desired outcome is to make false negative errors equally on people from all races, then you care about this criteria. This is important in cases where your intervention is assistive (providing helpful social services for example) and missing an individual could lead to adverse outcomes for them. Using this criteria allows you to make sure that you’re not missing people from certain groups disproportionately.	For race (with reference group as Caucasian) Native American with 0.21X Disparity African-American with 0.59X Disparity Asian with 0.70X Disparity Other with 1.42X Disparity

Go to Top

Audit Results: Details by Protected Attributes

race

Attribute Value	False Discovery Rate Parity	False Positive Rate Parity	False Negative Rate Parity
African-American	Passed	Failed	Failed
Asian	Failed	Failed	Failed
Caucasian	Ref	Ref	Ref
Hispanic	Passed	Passed	Passed
Native American	Failed	Failed	Failed
Other	Passed	Failed	Failed

Go to Top

Audit Results: Bias Metrics Values

race

Attribute Value	False Discovery Rate Disparity	False Positive Rate Disparity	False Negative Rate Disparity
African-American	0.91	1.91	0.59
Asian	0.61	0.37	0.7
Caucasian	1.0	1.0	1.0
Hispanic	1.12	0.92	1.17
Native American	0.61	1.6	0.21
Other	1.12	0.63	1.42

Go to Previous

Go to Top

Audit Results: Group Metrics Values

race

Attribute Value	Group Size Ratio	False Discovery Rate	False Positive Rate	False Negative Rate
African-American	0.51	0.37	0.45	0.28
Asian	0	0.25	0.09	0.33
Caucasian	0.34	0.41	0.23	0.48
Hispanic	0.09	0.46	0.21	0.56
Native American	0	0.25	0.38	0.1
Other	0.05	0.46	0.15	0.68

Go to Previous

Go to Top