Evaluating the Performance of Facial Recognition Algorithms on Morphed Human Faces

Abstract

This paper evaluates the effectiveness of six face recognition algorithms - DeepID, Facenet, Facenet512, OpenFace, SFace, and VGG-Face - on seven datasets, including those morphed using FaceFusion, OpenCV, FaceMorpher, and UBO techniques. It evaluates the ability of these algorithms to identify individuals from morphed faces compared to standard, differently angled, and illuminated images.

Through 585,000 comparisons across 150 reference photos, notable performance differences emerge, highlighting the impact of brightness adjustments in moving images and the superior performance of models such as Facenet and Facenet512. In addition, the study examines the distance-threshold difference and its effect on model accuracy in discriminating morphed images.

Finally, the primary errors and image features that lead to misclassifications are analyzed to gain insight into common errors.

Introduction

Facial recognition technology, a rapidly evolving field in biometrics, has unique advantages over other modalities. The face serves as a highly distinctive biometric characteristic, offering high convenience and acceptance among data subjects.

Additionally, it allows for data capture without physical contact, leveraging high-resolution cameras in smartphones operating primarily in the visible spectrum, thus eliminating the need for specialized capture devices.

The performance of facial recognition systems relies on accurately capturing and analyzing facial features through several critical processing stages. These stages include segmentation, which detects and tracks individuals within the captured scene and locates the face or region of interest. Following segmentation, image pre-processing improves signal quality. Then, feature extraction and comparison analyze the texture of the segmented region and compare probe images to reference feature vectors to verify individuals' identities.

Background

With the Face Recognition Vendor Test (FRVT) dataset, this study selects several open-source models to evaluate their effectiveness without relying on high-frequency data and self-enhancement techniques.

Models Evaluated

VGG-Face: Developed by the Visual Geometry Group at the University of Oxford, this model is a variant of the VGG neural network specifically designed for facial recognition.
FaceNet: Created by Google, FaceNet is known for its efficiency and performance in face detection and recognition, achieving remarkable accuracy on the LFW dataset and YouTube Faces DB.
ArcFace: Jointly developed by researchers from Imperial College London and InsightFace, it introduces a novel loss function that significantly improves face recognition accuracy.
DeepID: A series of models focused on deep learning-based identification, aiming to identify individuals across varying poses, expressions, and lighting conditions with high precision.
SFace: Developed by Facebook, SFace is a deep neural network-based model trained on a large dataset of faces, aiming to bridge the gap between machine and human-level performance in facial recognition tasks.
OpenFace: An open-source face recognition system developed by Facebook, focusing on real-time face recognition and tracking.

Methodology

The study conducts experiments using reference, probe, and probe light datasets alongside four morphing techniques (FaceFusion, OpenCV, Facemorpher, and UBO Morpher). The key questions and hypotheses developed for this research include:

What was the accuracy of each method in identifying reference faces? This analysis included not only assessing the accuracy but also identifying instances of false positives and false negatives
How are the differences in distance and threshold distributed for each method? In face recognition systems distance refers to the measure of similarity or dissimilarity between two faces based on the features extracted from them. A threshold, on the other hand, is a predefined value that determines whether or not two faces are recognized as belonging to the same person. The calculation of the difference between the threshold and the distance was performed to determine how closely a face is accepted as being the same.
Which faces are most often misidentified? Is there a pattern in the faces that are most often misidentified when some faces are misidentified? This also included the detection of similar features between these misidentified individuals.

Roadmap

The methodology involves selecting 50 individuals from the reference dataset and creating morphed faces using the four techniques. The study then compares standard, differently lit, and morphed faces using the six models, resulting in 585,000 comparisons across 42 models. The paper follows the following roadmap:

Fig 1. Followed Roadmap

Scripts and Running Time

A program was run to loop through the command

DeepFace.find(img_path=full_img_path,db_path=db_path, model_name=model, distance_metric="cosine")[0]

to extract photos identified as identical from each dataset. Due to computational constraints, a representative sample of 50 individuals was selected, taking approximately 30 minutes per experiment.

Results

Accuracy Implementation

To measure accuracy, the true positives were divided by the actual number of matches required for 100% accuracy. The results indicate that models like Facenet and Facenet512 perform better under various conditions. Increased brightness in probe images enhances performance, and morphed faces generally show lower recognition accuracy, indicating model resistance to morphed images.

Fig 2. Accuracy Chart

Errors

DeepID showed a high false-positive rate, accepting faces as identical when they were not, while FaceNet and FaceNet512 had lower error rates. The study also analyzed the number of errors across different databases.

Fig 3. Model Errors

Distance-Threshold Difference

Analyzing the distribution of distance-threshold differences revealed that accurate models have narrower distributions, indicating better threshold setting. Filtering out high distance-threshold differences improves accuracy, suggesting a potential line of research for future studies.

Fig 4. Distance-Threshold Difference

Misidentified Faces

The study examined frequently misidentified faces and identified certain images that were misidentified more often, though no significant pattern emerged. It also highlighted misidentified image pairs with a high error rate, noting a pattern among similar-looking individuals, particularly Asian women.

Fig 5. Distance-Threshold Difference

Discussion

The results show that DeepID’s high acceptance rate requires further investigation, and brightness adjustments can significantly improve recognition accuracy. The study emphasizes the importance of refining distance-threshold analysis to enhance model accuracy. The identification of common misclassification errors provides valuable insights for improving facial recognition systems.

Limitations and Future Work

Due to computational constraints, the study used a smaller dataset. Future work should involve larger datasets and focus on refining distance-threshold analysis to improve model accuracy.

Conclusion

This paper highlights the superior performance of FaceNet models and underscores the importance of understanding distance-threshold relationships and common misclassification errors to develop more reliable facial recognition technologies. These findings are crucial for mitigating morphing attacks and enhancing security in real-world applications.

Read the full paper here: