Introduction

Introduction to MTCNN¶

1. History of MTCNN¶

MTCNN Pipeline

Figure 1: The MTCNN Pipeline for face detection.

MTCNN (Multitask Cascaded Convolutional Networks) was first introduced in a 2016 paper titled "Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks" by Kaipeng Zhang, Zhanpeng Zhang, Zhifeng Li, and Yu Qiao. This work was published in the IEEE Signal Processing Letters and later presented at the 2017 IEEE International Conference on Computer Vision (ICCV).

The method quickly became popular due to its ability to perform both face detection and facial landmark alignment in a single pipeline. It was designed to efficiently detect faces at different scales and orientations while also predicting key facial landmarks such as eyes, nose, and mouth. This multitask approach reduced the computational cost compared to running separate models for face detection and alignment.

2. The MTCNN Method¶

MTCNN uses a cascaded structure of three convolutional neural networks (CNNs) that work together to progressively refine face proposals and detect key landmarks. The networks are:

PNet (Proposal Network): This network scans the image at different scales to generate candidate face regions (bounding boxes).
RNet (Refinement Network): RNet takes the candidate face regions from PNet, refines them by filtering false positives, and regresses the bounding boxes.
ONet (Output Network): The final stage, ONet, further refines the bounding boxes and detects five facial landmarks: the eyes, nose, and the corners of the mouth.

The overall process is hierarchical, with each network focusing on more precise tasks as the proposal gets closer to the final face detection. This cascading structure helps balance accuracy and speed, ensuring high performance even in challenging conditions like varying lighting, pose, and facial occlusions.

Here’s an illustration of the architecture:

Figure 2: The MTCNN architecture consists of three networks (PNet, RNet, and ONet) that progressively refine face detection and alignment.

3. History of This Package¶

The original implementation of MTCNN was released in 2018 as an open-source project based on the original paper. Since then, it has been widely adopted in various computer vision tasks involving face detection and alignment, with many libraries and applications using the MTCNN model.

In 2024, a major refactor and optimization of the MTCNN package was undertaken to modernize the codebase, making it more robust, efficient, and compatible with the latest versions of TensorFlow (>2.17). Key improvements include:

A cleaner project structure with modular components for better maintainability.
Support for batch processing to handle multiple images at once.
Removal of outdated dependencies like OpenCV, switching to TensorFlow for image processing.
Full documentation and optimized performance through matrix-based operations.

This version of MTCNN retains the simplicity of the original interface while providing more flexibility and support for a broader range of use cases.