In [1]:

Copied!

import sys
sys.path.insert(0, "..")
import sys
sys.path.insert(0, "..")

MTCNN PNet¶

This notebook demonstrates the PNet architecture and its corresponding weights.

PNet is a fully convolutional neural network (CNN) used in the first stage of MTCNN. This network processes inputs of variable size and generates bounding box proposals. It produces two outputs:

Regression of the bounding box coordinates within the convolutional receptive field.
Classification of the receptive field into two categories: no-face or face.

The outputs are generated for each receptive field, meaning that with every convolutional pass, a corresponding output is produced.

In the following sections, we will run the MTCNN model, focusing solely on the PNet stage. We will examine the intermediate inputs, observe the output shapes, and visualize the results.

MTCNN on PNet Stage¶

MTCNN can be configured to run only up to the first stage, which will provide the direct output of the PNet stage.

In [2]:

Copied!





from mtcnn import MTCNN
from mtcnn.utils.images import load_image
from mtcnn.utils.tensorflow import set_gpu_memory_growth
from mtcnn.stages import StagePNet
from mtcnn import MTCNN
from mtcnn.utils.images import load_image
from mtcnn.utils.tensorflow import set_gpu_memory_growth
from mtcnn.stages import StagePNet

2024-10-02 19:38:21.331861: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-10-02 19:38:21.342042: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-10-02 19:38:21.354494: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-10-02 19:38:21.358349: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-10-02 19:38:21.367690: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-10-02 19:38:22.024220: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT

In [3]:

Copied!

# To avoid using excessive GPU memory (In case of using GPU)
set_gpu_memory_growth()
# To avoid using excessive GPU memory (In case of using GPU)
set_gpu_memory_growth()

In [4]:

Copied!

image = load_image("../resources/ivan.jpg")
image = load_image("../resources/ivan.jpg")

2024-10-02 19:38:22.806604: I tensorflow/core/common_runtime/gpu/gpu_device.cc:2021] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 1312 MB memory:  -> device: 0, name: NVIDIA RTX A6000, pci bus id: 0000:65:00.0, compute capability: 8.6
2024-10-02 19:38:22.807033: I tensorflow/core/common_runtime/gpu/gpu_device.cc:2021] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 7363 MB memory:  -> device: 1, name: NVIDIA GeForce GTX 1070, pci bus id: 0000:17:00.0, compute capability: 6.1

In [5]:

Copied!

mtcnn = MTCNN(stages=[StagePNet], device="CPU:0")    # other devices: GPU:0  ,  GPU:1  , ...
mtcnn = MTCNN(stages=[StagePNet], device="CPU:0")    # other devices: GPU:0  ,  GPU:1  , ...

In [6]:

Copied!

%%time
result = mtcnn.detect_faces(image, postprocess=True)
%%time
result = mtcnn.detect_faces(image, postprocess=True)

CPU times: user 413 ms, sys: 111 ms, total: 524 ms
Wall time: 310 ms

In [7]:

Copied!

result
result

Out[7]:

[{'box': [270, 89, 61, 61], 'confidence': 0.9999668598175049},
 {'box': [271, 89, 71, 71], 'confidence': 0.9997212290763855},
 {'box': [490, 209, 54, 54], 'confidence': 0.9992153644561768},
 {'box': [187, 243, 38, 38], 'confidence': 0.998630166053772},
 {'box': [480, 285, 57, 57], 'confidence': 0.9982782602310181},
 {'box': [296, 100, 32, 32], 'confidence': 0.9957242012023926},
 {'box': [192, 43, 108, 108], 'confidence': 0.9916715025901794},
 {'box': [101, 408, 42, 42], 'confidence': 0.9912404417991638},
 {'box': [97, 405, 52, 52], 'confidence': 0.9852192401885986},
 {'box': [11, 180, 43, 43], 'confidence': 0.9849668145179749},
 {'box': [8, 386, 31, 31], 'confidence': 0.9844192862510681},
 {'box': [394, 399, 48, 48], 'confidence': 0.9816769361495972},
 {'box': [14, 313, 40, 40], 'confidence': 0.9804034233093262},
 {'box': [184, 59, 18, 18], 'confidence': 0.9791208505630493},
 {'box': [495, 143, 58, 58], 'confidence': 0.9790045022964478},
 {'box': [286, 218, 62, 62], 'confidence': 0.9768547415733337},
 {'box': [344, 132, 20, 20], 'confidence': 0.9743143916130066},
 {'box': [403, 394, 41, 41], 'confidence': 0.9722734093666077},
 {'box': [180, 241, 46, 46], 'confidence': 0.9710206985473633},
 {'box': [496, 214, 41, 41], 'confidence': 0.9705135822296143},
 {'box': [275, 104, 30, 30], 'confidence': 0.9698752164840698},
 {'box': [144, 391, 78, 78], 'confidence': 0.9693538546562195},
 {'box': [4, 176, 54, 54], 'confidence': 0.9685015082359314},
 {'box': [187, 140, 40, 40], 'confidence': 0.9677426218986511},
 {'box': [283, 99, 45, 45], 'confidence': 0.967420756816864},
 {'box': [534, 382, 20, 20], 'confidence': 0.9653154611587524},
 {'box': [271, 99, 45, 45], 'confidence': 0.9631991386413574},
 {'box': [101, 509, 17, 17], 'confidence': 0.9630200862884521},
 {'box': [499, 289, 39, 39], 'confidence': 0.961385190486908},
 {'box': [290, 124, 32, 32], 'confidence': 0.9606941938400269},
 {'box': [334, 128, 28, 28], 'confidence': 0.9601700305938721},
 {'box': [250, 104, 21, 21], 'confidence': 0.9600563049316406},
 {'box': [182, 98, 19, 19], 'confidence': 0.9569499492645264},
 {'box': [338, 152, 19, 19], 'confidence': 0.9563547968864441},
 {'box': [8, 235, 58, 58], 'confidence': 0.9557236433029175},
 {'box': [1, 386, 40, 40], 'confidence': 0.9545572400093079},
 {'box': [513, 371, 39, 39], 'confidence': 0.9491947293281555},
 {'box': [322, 191, 27, 27], 'confidence': 0.9456313848495483},
 {'box': [470, 50, 53, 53], 'confidence': 0.9440603852272034},
 {'box': [100, 411, 30, 30], 'confidence': 0.9404458999633789},
 {'box': [31, 341, 32, 32], 'confidence': 0.937527060508728},
 {'box': [323, 188, 20, 20], 'confidence': 0.9356555938720703},
 {'box': [489, 434, 29, 29], 'confidence': 0.9347164630889893},
 {'box': [355, 260, 18, 18], 'confidence': 0.9298021197319031},
 {'box': [1, 396, 21, 21], 'confidence': 0.9291993975639343},
 {'box': [270, 56, 147, 147], 'confidence': 0.9255051016807556},
 {'box': [476, 270, 73, 73], 'confidence': 0.924798309803009},
 {'box': [506, 294, 22, 22], 'confidence': 0.9207442402839661},
 {'box': [73, 58, 225, 225], 'confidence': 0.9173569083213806},
 {'box': [262, 71, 101, 101], 'confidence': 0.9164451956748962},
 {'box': [13, 72, 31, 31], 'confidence': 0.9129998683929443},
 {'box': [26, 340, 39, 39], 'confidence': 0.9100756049156189},
 {'box': [239, 97, 31, 31], 'confidence': 0.9052125215530396},
 {'box': [148, 405, 36, 36], 'confidence': 0.8971834778785706},
 {'box': [445, 379, 43, 43], 'confidence': 0.8947854042053223},
 {'box': [446, 215, 22, 22], 'confidence': 0.8917657136917114},
 {'box': [239, 233, 81, 81], 'confidence': 0.8911052346229553},
 {'box': [220, 287, 20, 20], 'confidence': 0.8855998516082764},
 {'box': [36, 341, 24, 24], 'confidence': 0.8843594193458557},
 {'box': [481, 198, 76, 76], 'confidence': 0.8838769197463989},
 {'box': [17, 390, 21, 21], 'confidence': 0.8799570202827454},
 {'box': [4, 303, 55, 55], 'confidence': 0.8785687685012817},
 {'box': [430, 217, 19, 19], 'confidence': 0.8763736486434937},
 {'box': [206, 79, 23, 23], 'confidence': 0.8737393617630005},
 {'box': [7, 73, 42, 42], 'confidence': 0.8733800053596497},
 {'box': [174, 127, 72, 72], 'confidence': 0.8731698393821716},
 {'box': [280, 106, 22, 22], 'confidence': 0.8657463192939758},
 {'box': [523, 456, 21, 21], 'confidence': 0.8632909059524536},
 {'box': [62, 349, 28, 28], 'confidence': 0.8600795865058899},
 {'box': [476, 63, 28, 28], 'confidence': 0.8581259250640869},
 {'box': [489, 434, 35, 35], 'confidence': 0.8565669059753418},
 {'box': [24, 367, 20, 20], 'confidence': 0.853937566280365},
 {'box': [3, 176, 72, 72], 'confidence': 0.8522983193397522},
 {'box': [0, 297, 20, 20], 'confidence': 0.851826012134552},
 {'box': [42, 358, 78, 78], 'confidence': 0.8504625558853149},
 {'box': [342, 102, 23, 23], 'confidence': 0.8466385006904602},
 {'box': [335, 148, 26, 26], 'confidence': 0.8402417302131653},
 {'box': [374, 395, 77, 77], 'confidence': 0.837632417678833},
 {'box': [293, 160, 30, 30], 'confidence': 0.8371832370758057},
 {'box': [107, 369, 150, 150], 'confidence': 0.8341783881187439},
 {'box': [283, 148, 31, 31], 'confidence': 0.8329155445098877},
 {'box': [18, 72, 23, 23], 'confidence': 0.8310617804527283},
 {'box': [533, 271, 20, 20], 'confidence': 0.8309110403060913},
 {'box': [2, 314, 43, 43], 'confidence': 0.8295050859451294},
 {'box': [2, 247, 40, 40], 'confidence': 0.8290241956710815},
 {'box': [136, 387, 97, 97], 'confidence': 0.8286371827125549},
 {'box': [301, 220, 49, 49], 'confidence': 0.8285456299781799},
 {'box': [22, 184, 31, 31], 'confidence': 0.8255282044410706},
 {'box': [143, 419, 28, 28], 'confidence': 0.8249657154083252},
 {'box': [10, 74, 22, 22], 'confidence': 0.8228946924209595},
 {'box': [190, 2, 22, 22], 'confidence': 0.8213641047477722},
 {'box': [424, 483, 34, 34], 'confidence': 0.8204600214958191},
 {'box': [201, 205, 22, 22], 'confidence': 0.81780606508255},
 {'box': [189, 120, 30, 30], 'confidence': 0.8163595795631409},
 {'box': [10, 132, 29, 29], 'confidence': 0.8141602277755737},
 {'box': [39, 217, 23, 23], 'confidence': 0.8135595321655273},
 {'box': [185, 128, 58, 58], 'confidence': 0.810321569442749},
 {'box': [173, 424, 20, 20], 'confidence': 0.8083855509757996},
 {'box': [435, 212, 33, 33], 'confidence': 0.8042281866073608},
 {'box': [206, 62, 21, 21], 'confidence': 0.8023461699485779},
 {'box': [498, 152, 30, 30], 'confidence': 0.8022951483726501},
 {'box': [49, 377, 56, 56], 'confidence': 0.8021646738052368},
 {'box': [511, 33, 40, 40], 'confidence': 0.8009828925132751},
 {'box': [31, 341, 79, 79], 'confidence': 0.7994623184204102},
 {'box': [455, 401, 79, 79], 'confidence': 0.7946075201034546},
 {'box': [153, 112, 102, 102], 'confidence': 0.7888069152832031},
 {'box': [188, 96, 60, 60], 'confidence': 0.7880174517631531},
 {'box': [191, 121, 21, 21], 'confidence': 0.7873377799987793},
 {'box': [103, 53, 170, 170], 'confidence': 0.7869991064071655},
 {'box': [161, 31, 154, 154], 'confidence': 0.7862122654914856},
 {'box': [339, 172, 28, 28], 'confidence': 0.7811397314071655},
 {'box': [194, 135, 26, 26], 'confidence': 0.7713541388511658},
 {'box': [524, 267, 28, 28], 'confidence': 0.7680309414863586},
 {'box': [319, 164, 19, 19], 'confidence': 0.7631727457046509},
 {'box': [236, 101, 37, 37], 'confidence': 0.7625581622123718},
 {'box': [2, 1, 57, 57], 'confidence': 0.7596020698547363},
 {'box': [278, 136, 46, 46], 'confidence': 0.7581404447555542},
 {'box': [284, 153, 24, 24], 'confidence': 0.7557078003883362},
 {'box': [221, 212, 150, 150], 'confidence': 0.753204882144928},
 {'box': [513, 368, 30, 30], 'confidence': 0.7531015276908875},
 {'box': [464, 454, 21, 21], 'confidence': 0.74482661485672},
 {'box': [499, 148, 39, 39], 'confidence': 0.7422949075698853},
 {'box': [277, 135, 56, 56], 'confidence': 0.7366361618041992},
 {'box': [304, 28, 59, 59], 'confidence': 0.7317830920219421},
 {'box': [503, 293, 30, 30], 'confidence': 0.729342520236969},
 {'box': [486, 333, 23, 23], 'confidence': 0.728617250919342},
 {'box': [189, 142, 29, 29], 'confidence': 0.7246003746986389},
 {'box': [356, 387, 21, 21], 'confidence': 0.7240045070648193},
 {'box': [184, 205, 23, 23], 'confidence': 0.723656177520752},
 {'box': [334, 99, 38, 38], 'confidence': 0.7213565707206726},
 {'box': [501, 27, 51, 51], 'confidence': 0.7170071005821228},
 {'box': [273, 266, 38, 38], 'confidence': 0.7144962549209595},
 {'box': [252, 493, 40, 40], 'confidence': 0.7130072116851807},
 {'box': [453, 215, 20, 20], 'confidence': 0.706762969493866},
 {'box': [63, 396, 43, 43], 'confidence': 0.7053548693656921},
 {'box': [313, 189, 39, 39], 'confidence': 0.7040255069732666},
 {'box': [15, 241, 31, 31], 'confidence': 0.6972864866256714},
 {'box': [219, 161, 18, 18], 'confidence': 0.6943190693855286},
 {'box': [43, 9, 31, 31], 'confidence': 0.6927041411399841},
 {'box': [303, 5, 27, 27], 'confidence': 0.6924176812171936},
 {'box': [301, 259, 53, 53], 'confidence': 0.6918803453445435},
 {'box': [478, 319, 40, 40], 'confidence': 0.6887754201889038},
 {'box': [67, 508, 58, 52], 'confidence': 0.6868264079093933},
 {'box': [184, 112, 43, 43], 'confidence': 0.6865329742431641},
 {'box': [334, 135, 18, 18], 'confidence': 0.6855722069740295},
 {'box': [36, 350, 23, 23], 'confidence': 0.6833070516586304},
 {'box': [177, 95, 25, 25], 'confidence': 0.6830892562866211},
 {'box': [159, 420, 38, 38], 'confidence': 0.682868480682373},
 {'box': [318, 138, 19, 19], 'confidence': 0.6816803216934204},
 {'box': [263, 423, 29, 29], 'confidence': 0.6813008189201355},
 {'box': [284, 199, 20, 20], 'confidence': 0.6787427663803101},
 {'box': [67, 352, 21, 21], 'confidence': 0.6717443466186523},
 {'box': [481, 23, 74, 74], 'confidence': 0.6704385876655579},
 {'box': [523, 452, 31, 31], 'confidence': 0.6700493097305298},
 {'box': [243, 334, 76, 76], 'confidence': 0.6653152108192444},
 {'box': [454, 338, 29, 29], 'confidence': 0.6650230884552002},
 {'box': [49, 95, 22, 22], 'confidence': 0.6635971069335938},
 {'box': [321, 84, 55, 55], 'confidence': 0.6603143215179443},
 {'box': [480, 325, 31, 31], 'confidence': 0.6586322784423828},
 {'box': [294, 135, 24, 24], 'confidence': 0.6576036810874939},
 {'box': [60, 347, 39, 39], 'confidence': 0.6554562449455261},
 {'box': [458, 406, 21, 21], 'confidence': 0.65467768907547},
 {'box': [342, 138, 23, 23], 'confidence': 0.6540101766586304},
 {'box': [540, 441, 20, 22], 'confidence': 0.653633713722229},
 {'box': [300, 127, 25, 25], 'confidence': 0.6521259546279907},
 {'box': [170, 133, 54, 54], 'confidence': 0.6484688520431519},
 {'box': [20, 192, 22, 22], 'confidence': 0.644957959651947},
 {'box': [518, 296, 28, 28], 'confidence': 0.6440291404724121},
 {'box': [245, 522, 43, 38], 'confidence': 0.6340025067329407},
 {'box': [436, 367, 58, 58], 'confidence': 0.6332893967628479},
 {'box': [234, 233, 108, 108], 'confidence': 0.6274054646492004},
 {'box': [28, 85, 53, 53], 'confidence': 0.6244142055511475},
 {'box': [254, 502, 30, 30], 'confidence': 0.624413788318634},
 {'box': [319, 182, 37, 37], 'confidence': 0.6236416101455688},
 {'box': [29, 21, 31, 31], 'confidence': 0.6222331523895264},
 {'box': [9, 182, 33, 33], 'confidence': 0.6211090683937073},
 {'box': [17, 248, 21, 21], 'confidence': 0.6192639470100403},
 {'box': [141, 398, 54, 54], 'confidence': 0.618570864200592},
 {'box': [74, 386, 30, 30], 'confidence': 0.6184467673301697},
 {'box': [198, 203, 28, 28], 'confidence': 0.6183221936225891},
 {'box': [336, 103, 22, 22], 'confidence': 0.6169424653053284},
 {'box': [253, 530, 30, 30], 'confidence': 0.6161786317825317},
 {'box': [199, 58, 77, 77], 'confidence': 0.6141642332077026},
 {'box': [510, 87, 41, 41], 'confidence': 0.6061983704566956},
 {'box': [23, 212, 39, 39], 'confidence': 0.6061719655990601},
 {'box': [292, 267, 63, 63], 'confidence': 0.605388343334198},
 {'box': [446, 25, 112, 112], 'confidence': 0.604427695274353},
 {'box': [342, 147, 20, 20], 'confidence': 0.6038945317268372},
 {'box': [33, 249, 30, 30], 'confidence': 0.6038562655448914}]

The output of the processing is a set of bounding boxes along with a confidence score. We can see a plot of the output in the following cell:

In [8]:

Copied!

from mtcnn.utils.plotting import plot
import matplotlib.pyplot as plt

plt.imshow(plot(image, result))
from mtcnn.utils.plotting import plot
import matplotlib.pyplot as plt

plt.imshow(plot(image, result))

Out[8]:

<matplotlib.image.AxesImage at 0x7f537815f550>

No description has been provided for this image

As can be seen, the PNet is proposing several bounding boxes, which must be "refined" to discard those that do not fit. This is part of the RNet functionality.

Accessing PNet's model¶

The network can be accessed by instantiating StagePNet and reading the attribute model, which is a TensorFlow model.

In [9]:

Copied!

stage = StagePNet()
model = stage.model
stage = StagePNet()
model = stage.model

In [10]:

Copied!

model.summary()
model.summary()

Model: "p_net_1"

┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Layer (type)                    ┃ Output Shape           ┃       Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ conv1 (Conv2D)                  │ (None, None, None, 10) │           280 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ prelu1 (PReLU)                  │ (None, None, None, 10) │            10 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ maxpooling1 (MaxPooling2D)      │ (None, None, None, 10) │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ conv2 (Conv2D)                  │ (None, None, None, 16) │         1,456 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ prelu2 (PReLU)                  │ (None, None, None, 16) │            16 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ conv3 (Conv2D)                  │ (None, None, None, 32) │         4,640 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ prelu3 (PReLU)                  │ (None, None, None, 32) │            32 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ conv4-1 (Conv2D)                │ (None, None, None, 4)  │           132 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ conv4-2 (Conv2D)                │ (None, None, None, 2)  │            66 │
└─────────────────────────────────┴────────────────────────┴───────────────┘

 Total params: 6,632 (25.91 KB)

 Trainable params: 6,632 (25.91 KB)

 Non-trainable params: 0 (0.00 B)

Loading PNet's weights¶

The model weights are stored within the folder local mtcnn/assets/weights/ under the filename pnet.lz4. It can be loaded with joblib.

In [11]:

Copied!

import joblib

pnet_weights = joblib.load("../mtcnn/assets/weights/pnet.lz4")
import joblib

pnet_weights = joblib.load("../mtcnn/assets/weights/pnet.lz4")

In [12]:

Copied!

len(pnet_weights)
len(pnet_weights)

Out[12]:

In [13]:

Copied!

[w.shape for w in pnet_weights]
[w.shape for w in pnet_weights]

Out[13]:

[(3, 3, 3, 10),
 (10,),
 (1, 1, 10),
 (3, 3, 10, 16),
 (16,),
 (1, 1, 16),
 (3, 3, 16, 32),
 (32,),
 (1, 1, 32),
 (1, 1, 32, 4),
 (4,),
 (1, 1, 32, 2),
 (2,)]

Further stage ablation can be performed by looking at mtcnn/stages/stage_pnet.py

In [ ]: