Brand

The Bias Report in Action

Using a clean version of the COMPAS dataset, we demostrate the use of The Bias Report web app. Click below for background on the dataset, a description of the process, and analysis.

Background

In 2016, Propublica reported on racial inequality in COMPAS, a risk assessment tool. They showed the algorithm led to unfair disparities in False Negative and False Positive Rates. In particular, they showed black defendants who would not go on to recidivate faced disproportionately high risk scores, while white defendants who would recidivate received disproportionately low risk scores. Northpointe, the company responsible for the algorithm, responded by arguing they callibrated the algorithm to be fair in terms of False Discovery Rate, also known as calibration. With the Bias Report, we get metrics on each type of disparity, adding clarity to the bias auditing process.

The Process

  1. Upload data
  2. First, we upload the data. The cleaned dataset is available on the upload page and follows the format described here.

  3. Select Protected Groups
  4. Following the Propublica-Northpointe debate we focus on race. We select a custom reference group and use Caucasian as the reference group. Our metrics will thus reflect fairness in relation to the historically dominant group.

  5. Select Fairness Metrics
  6. Again following the debate, we select False Positive Rates, False Negative Rates and False Discovery Rates.

  7. Choose threshold
  8. We stick with the default value of 80 percent. This means that any group metric that is between 80 and 125 percent of the reference group metric is considered fair and any metric outside that range is considered unfair.

Analysis

Scrolling through the Bias Report, we the African-American false discovery rates are within the bounds of fairness. This result is expected because COMPAS is calibrated. (The overall FDR fairness returns false, because Asian and Native American defendants did not fall within the fairness threshholds for FDR). On the other hand, African-Americans are roughly twice as likely to have false positives and 40 percent less likely to false negatives. In real terms, 44.8% of African-Americans who did not recidivate were marked high or medium risk (with potential for associated penalties), compared with 23.4% of Caucasian non-reoffenders. This is unfair and is marked False below. These findings mark an inherent trade-off between FPR Fairness, FNR Fairness and calibration, which is present in any decision system where base rates are not equal. See Chouldechova (2017). Aequitas helps bring this trade-off to the forefront with clear metrics and asks system designers to make a reasoned decision based on their use case.

Hide

The Bias Report

 

Audit Date: 04 Jun 2018
Data Audited: 7214 rows
Attributes Audited: race
Audit Goal(s): False Positive Rate Parity - Ensure all protected groups have the same false positive rates as the reference group).
False Discovery Rate Parity - Ensure all protected groups have equally proportional false positives within the selected set (compared to the reference group).
False Negative Rate Parity - Ensure all protected groups have the same false negative rates (as the reference group).
Reference Groups: Custom group - The reference groups you selected for each attribute will be used to calculate relative disparities in this audit.
Fairness Threshold: 80%. If disparity for a group is within 80% and 125% of the value of the reference group on a group metric (e.g. False Positive Rate), this audit will pass.

 


 

Audit Results:

  1. Summary

  2. Details by Fairness Measures

  3. Details by Protected Attributes

  4. Bias Metrics Values

  5. Base Metrics Calculated for Each Group

 


 

Audit Results: Summary

False Positive Rate Parity - Ensure all protected groups have the same false positive rates as the reference group). Failed Details
False Discovery Rate Parity - Ensure all protected groups have equally proportional false positives within the selected set (compared to the reference group). Failed Details
False Negative Rate Parity - Ensure all protected groups have the same false negative rates (as the reference group). Failed Details

 


 

Audit Results: Details by Fairness Measures

 

False Positive Rate Parity: Failed

What is it? When does it matter? Which groups failed the audit:
This criteria considers an attribute to have False Positive parity if every group has the same False Positive Error Rate. For example, if race has false positive parity, it implies that all three races have the same False Positive Error Rate. If your desired outcome is to make false positive errors equally on people from all races, then you care about this criteria. This is important in cases where your intervention is punitive and has a risk of adverse outcomes for individuals. Using this criteria allows you to make sure that you are not making false positive mistakes about any single group disproportionately. For race (with reference group as Caucasian)
   Asian with 0.37X Disparity
   African-American with 1.91X Disparity
   Native American with 1.60X Disparity
   Other with 0.63X Disparity

 

Go to Top

 

 


 

 

False Discovery Rate Parity: Failed

What is it? When does it matter? Which groups failed the audit:
This criteria considers an attribute to have False Discovery Rate parity if every group has the same False Discovery Error Rate. For example, if race has false discovery parity, it implies that all three races have the same False Discvery Error Rate. If your desired outcome is to make false positive errors equally on people from all races, then you care about this criteria. This is important in cases where your intervention is punitive and can hurt individuals and where you are selecting a very small group for interventions. For race (with reference group as Caucasian)
   Native American with 0.61X Disparity
   Asian with 0.61X Disparity

 

Go to Top

 

 


 

 

False Negative Rate Parity: Failed

What is it? When does it matter? Which groups failed the audit:
This criteria considers an attribute to have False Negative parity if every group has the same False Negative Error Rate. For example, if race has false negative parity, it implies that all three races have the same False Negative Error Rate. If your desired outcome is to make false negative errors equally on people from all races, then you care about this criteria. This is important in cases where your intervention is assistive (providing helpful social services for example) and missing an individual could lead to adverse outcomes for them. Using this criteria allows you to make sure that you’re not missing people from certain groups disproportionately. For race (with reference group as Caucasian)
   Native American with 0.21X Disparity
   African-American with 0.59X Disparity
   Asian with 0.70X Disparity
   Other with 1.42X Disparity

 

Go to Top

 

 


 


Audit Results: Details by Protected Attributes

 

race

 

Attribute Value False Discovery Rate Parity False Positive Rate Parity False Negative Rate Parity
African-American Passed Failed Failed
Asian Failed Failed Failed
Caucasian Ref Ref Ref
Hispanic Passed Passed Passed
Native American Failed Failed Failed
Other Passed Failed Failed

Go to Top

 

 


Audit Results: Bias Metrics Values

 

race

 

Attribute Value False Discovery Rate Disparity False Positive Rate Disparity False Negative Rate Disparity
African-American 0.91 1.91 0.59
Asian 0.61 0.37 0.7
Caucasian 1.0 1.0 1.0
Hispanic 1.12 0.92 1.17
Native American 0.61 1.6 0.21
Other 1.12 0.63 1.42

Go to Previous

Go to Top

 

 


Audit Results: Group Metrics Values

 

race

 

Attribute Value Group Size Ratio False Discovery Rate False Positive Rate False Negative Rate
African-American 0.51 0.37 0.45 0.28
Asian 0 0.25 0.09 0.33
Caucasian 0.34 0.41 0.23 0.48
Hispanic 0.09 0.46 0.21 0.56
Native American 0 0.25 0.38 0.1
Other 0.05 0.46 0.15 0.68

Go to Previous

Go to Top