The Bias Report in Action
Using a clean version of the COMPAS dataset, we demostrate the use of The Bias Report web app. Click below for background on the dataset, a description of the process, and analysis.
Background
In 2016, Propublica reported on racial inequality in COMPAS, a risk assessment tool. They showed the algorithm led to unfair disparities in False Negative and False Positive Rates. In particular, they showed black defendants who would not go on to recidivate faced disproportionately high risk scores, while white defendants who would recidivate received disproportionately low risk scores. Northpointe, the company responsible for the algorithm, responded by arguing they callibrated the algorithm to be fair in terms of False Discovery Rate, also known as calibration. With the Bias Report, we get metrics on each type of disparity, adding clarity to the bias auditing process.
The Process
- Upload data
- Select Protected Groups
- Select Fairness Metrics
- Choose threshold
First, we upload the data. The cleaned dataset is available on the upload page and follows the format described here.
Following the Propublica-Northpointe debate we focus on race. We select a custom reference group and use Caucasian as the reference group. Our metrics will thus reflect fairness in relation to the historically dominant group.
Again following the debate, we select False Positive Rates, False Negative Rates and False Discovery Rates.
We stick with the default value of 80 percent. This means that any group metric that is between 80 and 125 percent of the reference group metric is considered fair and any metric outside that range is considered unfair.
Analysis
Scrolling through the Bias Report, we the African-American false discovery rates are within the bounds of fairness. This result is expected because COMPAS is calibrated. (The overall FDR fairness returns false, because Asian and Native American defendants did not fall within the fairness threshholds for FDR). On the other hand, African-Americans are roughly twice as likely to have false positives and 40 percent less likely to false negatives. In real terms, 44.8% of African-Americans who did not recidivate were marked high or medium risk (with potential for associated penalties), compared with 23.4% of Caucasian non-reoffenders. This is unfair and is marked False below. These findings mark an inherent trade-off between FPR Fairness, FNR Fairness and calibration, which is present in any decision system where base rates are not equal. See Chouldechova (2017). Aequitas helps bring this trade-off to the forefront with clear metrics and asks system designers to make a reasoned decision based on their use case.
HideThe Bias Report
Audit Date: | 04 Jun 2018 |
Data Audited: | 7214 rows |
Attributes Audited: | race |
Audit Goal(s): | False Positive Rate Parity - Ensure all protected groups have the same false positive rates as the reference group). |
False Discovery Rate Parity - Ensure all protected groups have equally proportional false positives within the selected set (compared to the reference group). | |
False Negative Rate Parity - Ensure all protected groups have the same false negative rates (as the reference group). | |
Reference Groups: | Custom group - The reference groups you selected for each attribute will be used to calculate relative disparities in this audit. |
Fairness Threshold: | 80%. If disparity for a group is within 80% and 125% of the value of the reference group on a group metric (e.g. False Positive Rate), this audit will pass. |
Audit Results:
Audit Results: Summary
False Positive Rate Parity - Ensure all protected groups have the same false positive rates as the reference group). | Failed | Details |
False Discovery Rate Parity - Ensure all protected groups have equally proportional false positives within the selected set (compared to the reference group). | Failed | Details |
False Negative Rate Parity - Ensure all protected groups have the same false negative rates (as the reference group). | Failed | Details |
Audit Results: Details by Fairness Measures
False Positive Rate Parity: Failed
What is it? | When does it matter? | Which groups failed the audit: |
---|---|---|
This criteria considers an attribute to have False Positive parity if every group has the same False Positive Error Rate. For example, if race has false positive parity, it implies that all three races have the same False Positive Error Rate. | If your desired outcome is to make false positive errors equally on people from all races, then you care about this criteria. This is important in cases where your intervention is punitive and has a risk of adverse outcomes for individuals. Using this criteria allows you to make sure that you are not making false positive mistakes about any single group disproportionately. | For race (with reference group as Caucasian) Asian with 0.37X Disparity African-American with 1.91X Disparity Native American with 1.60X Disparity Other with 0.63X Disparity |
False Discovery Rate Parity: Failed
What is it? | When does it matter? | Which groups failed the audit: |
---|---|---|
This criteria considers an attribute to have False Discovery Rate parity if every group has the same False Discovery Error Rate. For example, if race has false discovery parity, it implies that all three races have the same False Discvery Error Rate. | If your desired outcome is to make false positive errors equally on people from all races, then you care about this criteria. This is important in cases where your intervention is punitive and can hurt individuals and where you are selecting a very small group for interventions. | For race (with reference group as Caucasian) Native American with 0.61X Disparity Asian with 0.61X Disparity |
False Negative Rate Parity: Failed
What is it? | When does it matter? | Which groups failed the audit: |
---|---|---|
This criteria considers an attribute to have False Negative parity if every group has the same False Negative Error Rate. For example, if race has false negative parity, it implies that all three races have the same False Negative Error Rate. | If your desired outcome is to make false negative errors equally on people from all races, then you care about this criteria. This is important in cases where your intervention is assistive (providing helpful social services for example) and missing an individual could lead to adverse outcomes for them. Using this criteria allows you to make sure that you’re not missing people from certain groups disproportionately. | For race (with reference group as Caucasian) Native American with 0.21X Disparity African-American with 0.59X Disparity Asian with 0.70X Disparity Other with 1.42X Disparity |
Audit Results: Details by Protected Attributes
race
Audit Results: Bias Metrics Values
race
Audit Results: Group Metrics Values
race
Attribute Value | Group Size Ratio | False Discovery Rate | False Positive Rate | False Negative Rate |
---|---|---|---|---|
African-American | 0.51 | 0.37 | 0.45 | 0.28 |
Asian | 0 | 0.25 | 0.09 | 0.33 |
Caucasian | 0.34 | 0.41 | 0.23 | 0.48 |
Hispanic | 0.09 | 0.46 | 0.21 | 0.56 |
Native American | 0 | 0.25 | 0.38 | 0.1 |
Other | 0.05 | 0.46 | 0.15 | 0.68 |