🧑‍🦳

Gender and Ethnicity Bias Measurement

Trueface Reduces Bias Across all Ethnicities with Newest Face Recognition Model - March 2021

Summary:

At an operating threshold of 0.4, TVF5 correctly identifies 9,999 of every 10,000 identities. Raising the operating threshold will further reduce this False Positive Rate.
TFV5 saw a decrease in bias across all ethnicity and gender groups.
Of equal importance, TFV5 saw a significant reduction in bias in traditionally underrepresented groups such as East Asians and Southeast Asians.

At Trueface, we are committed to eliminating bias in our face recognition models. Machine learning algorithms should not discriminate against any ethnicity or minority group. Our research and engineering teams are working diligently to ensure our models achieve parity of performance across all demographics.

If you require a refresher on algorithmic bias, check out this article my colleague wrote which defines algorithmic bias and outlines a few of the strategies for mitigating bias in machine learning models.

In a previous article, I introduced Fairface, a unique dataset that contains roughly the same number of face images from each ethnicity and gender group. In that article, I quantified the bias in the Trueface production face recognition model. In an effort to continue our transparency, I will share the results of that very same evaluation on our newest face recognition model, TFV5.

The Fairface dataset contains a balanced number of face images from seven major ethnic groups and contains no more than a single image for each identity. In the evaluation, we generate a face recognition template for each image in the dataset, then compare every face template against one another to generate a similarity score. Generally, when quantifying the performance of a face recognition model, we generate and plot a Detection Error Tradeoff (DET) curve. However, since every comparison performed in our evaluation is an impostor match (a comparison of two different identities), we instead plot the False Positive Rate (FPR) vs similarity threshold. A lower and flatter curve indicates better performance because it means that there are fewer false positives at the given threshold.

Comparison of bias in TFV4 vs TFV5

The plots above compare the bias in our TFV4 model to our recently released TFV5 model for each of the tested ethnicity and gender groups. TFV5 saw a decrease in bias across all ethnicity and gender groups. Of equal importance, TFV5 saw a significant reduction in bias in traditionally underrepresented groups such as East Asians and Southeast Asians. This is a result of supplementing our training dataset with additional ethically-sourced data from these underrepresented groups.

In general, we advise that our clients operate at a similarity score threshold of between 0.3 to 0.4, though the exact threshold is ultimately dictated by the desired False Positive Rate or False Negative Rate. What you will notice in the two plots below is that TFV5 has significantly fewer false positives in the operating region for all ethnicities. In a real world scenario of using single factor authentication for access control, TFV5’s performance improvement translates into increased security: halting more would-be intruders than our previous models. Of course, when paired with a second factor, like an RFID card or a PIN, security is increased even more.

TFV4 False Positive Rates by ethnicity

TFV5 False Positive Rates by ethnicity

At an operating threshold of 0.4, TVF5 correctly identifies 9,999 of every 10,000 identities. Raising the operating threshold will further reduce this False Positive Rate.

As leaders in the computer vision industry, we have a responsibility to achieve parity of performance across all ethnicities and genders. The benefits of this powerful technology should be equitable. Let us state plainly that we will not stop until this goal has been achieved. We are excited to realize the advantages of this technology together.