import sys
sys.path.insert(0, "..")
MTCNN PNet¶
This notebook demonstrates the PNet architecture and its corresponding weights.
PNet is a fully convolutional neural network (CNN) used in the first stage of MTCNN. This network processes inputs of variable size and generates bounding box proposals. It produces two outputs:
- Regression of the bounding box coordinates within the convolutional receptive field.
- Classification of the receptive field into two categories: no-face or face.
The outputs are generated for each receptive field, meaning that with every convolutional pass, a corresponding output is produced.
In the following sections, we will run the MTCNN model, focusing solely on the PNet stage. We will examine the intermediate inputs, observe the output shapes, and visualize the results.
MTCNN on PNet Stage¶
MTCNN can be configured to run only up to the first stage, which will provide the direct output of the PNet stage.
from mtcnn import MTCNN
from mtcnn.utils.images import load_image
from mtcnn.utils.tensorflow import set_gpu_memory_growth
from mtcnn.stages import StagePNet
2024-10-02 19:38:21.331861: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`. 2024-10-02 19:38:21.342042: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered 2024-10-02 19:38:21.354494: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered 2024-10-02 19:38:21.358349: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered 2024-10-02 19:38:21.367690: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. 2024-10-02 19:38:22.024220: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
# To avoid using excessive GPU memory (In case of using GPU)
set_gpu_memory_growth()
image = load_image("../resources/ivan.jpg")
2024-10-02 19:38:22.806604: I tensorflow/core/common_runtime/gpu/gpu_device.cc:2021] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 1312 MB memory: -> device: 0, name: NVIDIA RTX A6000, pci bus id: 0000:65:00.0, compute capability: 8.6 2024-10-02 19:38:22.807033: I tensorflow/core/common_runtime/gpu/gpu_device.cc:2021] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 7363 MB memory: -> device: 1, name: NVIDIA GeForce GTX 1070, pci bus id: 0000:17:00.0, compute capability: 6.1
mtcnn = MTCNN(stages=[StagePNet], device="CPU:0") # other devices: GPU:0 , GPU:1 , ...
%%time
result = mtcnn.detect_faces(image, postprocess=True)
CPU times: user 413 ms, sys: 111 ms, total: 524 ms Wall time: 310 ms
result
[{'box': [270, 89, 61, 61], 'confidence': 0.9999668598175049},
{'box': [271, 89, 71, 71], 'confidence': 0.9997212290763855},
{'box': [490, 209, 54, 54], 'confidence': 0.9992153644561768},
{'box': [187, 243, 38, 38], 'confidence': 0.998630166053772},
{'box': [480, 285, 57, 57], 'confidence': 0.9982782602310181},
{'box': [296, 100, 32, 32], 'confidence': 0.9957242012023926},
{'box': [192, 43, 108, 108], 'confidence': 0.9916715025901794},
{'box': [101, 408, 42, 42], 'confidence': 0.9912404417991638},
{'box': [97, 405, 52, 52], 'confidence': 0.9852192401885986},
{'box': [11, 180, 43, 43], 'confidence': 0.9849668145179749},
{'box': [8, 386, 31, 31], 'confidence': 0.9844192862510681},
{'box': [394, 399, 48, 48], 'confidence': 0.9816769361495972},
{'box': [14, 313, 40, 40], 'confidence': 0.9804034233093262},
{'box': [184, 59, 18, 18], 'confidence': 0.9791208505630493},
{'box': [495, 143, 58, 58], 'confidence': 0.9790045022964478},
{'box': [286, 218, 62, 62], 'confidence': 0.9768547415733337},
{'box': [344, 132, 20, 20], 'confidence': 0.9743143916130066},
{'box': [403, 394, 41, 41], 'confidence': 0.9722734093666077},
{'box': [180, 241, 46, 46], 'confidence': 0.9710206985473633},
{'box': [496, 214, 41, 41], 'confidence': 0.9705135822296143},
{'box': [275, 104, 30, 30], 'confidence': 0.9698752164840698},
{'box': [144, 391, 78, 78], 'confidence': 0.9693538546562195},
{'box': [4, 176, 54, 54], 'confidence': 0.9685015082359314},
{'box': [187, 140, 40, 40], 'confidence': 0.9677426218986511},
{'box': [283, 99, 45, 45], 'confidence': 0.967420756816864},
{'box': [534, 382, 20, 20], 'confidence': 0.9653154611587524},
{'box': [271, 99, 45, 45], 'confidence': 0.9631991386413574},
{'box': [101, 509, 17, 17], 'confidence': 0.9630200862884521},
{'box': [499, 289, 39, 39], 'confidence': 0.961385190486908},
{'box': [290, 124, 32, 32], 'confidence': 0.9606941938400269},
{'box': [334, 128, 28, 28], 'confidence': 0.9601700305938721},
{'box': [250, 104, 21, 21], 'confidence': 0.9600563049316406},
{'box': [182, 98, 19, 19], 'confidence': 0.9569499492645264},
{'box': [338, 152, 19, 19], 'confidence': 0.9563547968864441},
{'box': [8, 235, 58, 58], 'confidence': 0.9557236433029175},
{'box': [1, 386, 40, 40], 'confidence': 0.9545572400093079},
{'box': [513, 371, 39, 39], 'confidence': 0.9491947293281555},
{'box': [322, 191, 27, 27], 'confidence': 0.9456313848495483},
{'box': [470, 50, 53, 53], 'confidence': 0.9440603852272034},
{'box': [100, 411, 30, 30], 'confidence': 0.9404458999633789},
{'box': [31, 341, 32, 32], 'confidence': 0.937527060508728},
{'box': [323, 188, 20, 20], 'confidence': 0.9356555938720703},
{'box': [489, 434, 29, 29], 'confidence': 0.9347164630889893},
{'box': [355, 260, 18, 18], 'confidence': 0.9298021197319031},
{'box': [1, 396, 21, 21], 'confidence': 0.9291993975639343},
{'box': [270, 56, 147, 147], 'confidence': 0.9255051016807556},
{'box': [476, 270, 73, 73], 'confidence': 0.924798309803009},
{'box': [506, 294, 22, 22], 'confidence': 0.9207442402839661},
{'box': [73, 58, 225, 225], 'confidence': 0.9173569083213806},
{'box': [262, 71, 101, 101], 'confidence': 0.9164451956748962},
{'box': [13, 72, 31, 31], 'confidence': 0.9129998683929443},
{'box': [26, 340, 39, 39], 'confidence': 0.9100756049156189},
{'box': [239, 97, 31, 31], 'confidence': 0.9052125215530396},
{'box': [148, 405, 36, 36], 'confidence': 0.8971834778785706},
{'box': [445, 379, 43, 43], 'confidence': 0.8947854042053223},
{'box': [446, 215, 22, 22], 'confidence': 0.8917657136917114},
{'box': [239, 233, 81, 81], 'confidence': 0.8911052346229553},
{'box': [220, 287, 20, 20], 'confidence': 0.8855998516082764},
{'box': [36, 341, 24, 24], 'confidence': 0.8843594193458557},
{'box': [481, 198, 76, 76], 'confidence': 0.8838769197463989},
{'box': [17, 390, 21, 21], 'confidence': 0.8799570202827454},
{'box': [4, 303, 55, 55], 'confidence': 0.8785687685012817},
{'box': [430, 217, 19, 19], 'confidence': 0.8763736486434937},
{'box': [206, 79, 23, 23], 'confidence': 0.8737393617630005},
{'box': [7, 73, 42, 42], 'confidence': 0.8733800053596497},
{'box': [174, 127, 72, 72], 'confidence': 0.8731698393821716},
{'box': [280, 106, 22, 22], 'confidence': 0.8657463192939758},
{'box': [523, 456, 21, 21], 'confidence': 0.8632909059524536},
{'box': [62, 349, 28, 28], 'confidence': 0.8600795865058899},
{'box': [476, 63, 28, 28], 'confidence': 0.8581259250640869},
{'box': [489, 434, 35, 35], 'confidence': 0.8565669059753418},
{'box': [24, 367, 20, 20], 'confidence': 0.853937566280365},
{'box': [3, 176, 72, 72], 'confidence': 0.8522983193397522},
{'box': [0, 297, 20, 20], 'confidence': 0.851826012134552},
{'box': [42, 358, 78, 78], 'confidence': 0.8504625558853149},
{'box': [342, 102, 23, 23], 'confidence': 0.8466385006904602},
{'box': [335, 148, 26, 26], 'confidence': 0.8402417302131653},
{'box': [374, 395, 77, 77], 'confidence': 0.837632417678833},
{'box': [293, 160, 30, 30], 'confidence': 0.8371832370758057},
{'box': [107, 369, 150, 150], 'confidence': 0.8341783881187439},
{'box': [283, 148, 31, 31], 'confidence': 0.8329155445098877},
{'box': [18, 72, 23, 23], 'confidence': 0.8310617804527283},
{'box': [533, 271, 20, 20], 'confidence': 0.8309110403060913},
{'box': [2, 314, 43, 43], 'confidence': 0.8295050859451294},
{'box': [2, 247, 40, 40], 'confidence': 0.8290241956710815},
{'box': [136, 387, 97, 97], 'confidence': 0.8286371827125549},
{'box': [301, 220, 49, 49], 'confidence': 0.8285456299781799},
{'box': [22, 184, 31, 31], 'confidence': 0.8255282044410706},
{'box': [143, 419, 28, 28], 'confidence': 0.8249657154083252},
{'box': [10, 74, 22, 22], 'confidence': 0.8228946924209595},
{'box': [190, 2, 22, 22], 'confidence': 0.8213641047477722},
{'box': [424, 483, 34, 34], 'confidence': 0.8204600214958191},
{'box': [201, 205, 22, 22], 'confidence': 0.81780606508255},
{'box': [189, 120, 30, 30], 'confidence': 0.8163595795631409},
{'box': [10, 132, 29, 29], 'confidence': 0.8141602277755737},
{'box': [39, 217, 23, 23], 'confidence': 0.8135595321655273},
{'box': [185, 128, 58, 58], 'confidence': 0.810321569442749},
{'box': [173, 424, 20, 20], 'confidence': 0.8083855509757996},
{'box': [435, 212, 33, 33], 'confidence': 0.8042281866073608},
{'box': [206, 62, 21, 21], 'confidence': 0.8023461699485779},
{'box': [498, 152, 30, 30], 'confidence': 0.8022951483726501},
{'box': [49, 377, 56, 56], 'confidence': 0.8021646738052368},
{'box': [511, 33, 40, 40], 'confidence': 0.8009828925132751},
{'box': [31, 341, 79, 79], 'confidence': 0.7994623184204102},
{'box': [455, 401, 79, 79], 'confidence': 0.7946075201034546},
{'box': [153, 112, 102, 102], 'confidence': 0.7888069152832031},
{'box': [188, 96, 60, 60], 'confidence': 0.7880174517631531},
{'box': [191, 121, 21, 21], 'confidence': 0.7873377799987793},
{'box': [103, 53, 170, 170], 'confidence': 0.7869991064071655},
{'box': [161, 31, 154, 154], 'confidence': 0.7862122654914856},
{'box': [339, 172, 28, 28], 'confidence': 0.7811397314071655},
{'box': [194, 135, 26, 26], 'confidence': 0.7713541388511658},
{'box': [524, 267, 28, 28], 'confidence': 0.7680309414863586},
{'box': [319, 164, 19, 19], 'confidence': 0.7631727457046509},
{'box': [236, 101, 37, 37], 'confidence': 0.7625581622123718},
{'box': [2, 1, 57, 57], 'confidence': 0.7596020698547363},
{'box': [278, 136, 46, 46], 'confidence': 0.7581404447555542},
{'box': [284, 153, 24, 24], 'confidence': 0.7557078003883362},
{'box': [221, 212, 150, 150], 'confidence': 0.753204882144928},
{'box': [513, 368, 30, 30], 'confidence': 0.7531015276908875},
{'box': [464, 454, 21, 21], 'confidence': 0.74482661485672},
{'box': [499, 148, 39, 39], 'confidence': 0.7422949075698853},
{'box': [277, 135, 56, 56], 'confidence': 0.7366361618041992},
{'box': [304, 28, 59, 59], 'confidence': 0.7317830920219421},
{'box': [503, 293, 30, 30], 'confidence': 0.729342520236969},
{'box': [486, 333, 23, 23], 'confidence': 0.728617250919342},
{'box': [189, 142, 29, 29], 'confidence': 0.7246003746986389},
{'box': [356, 387, 21, 21], 'confidence': 0.7240045070648193},
{'box': [184, 205, 23, 23], 'confidence': 0.723656177520752},
{'box': [334, 99, 38, 38], 'confidence': 0.7213565707206726},
{'box': [501, 27, 51, 51], 'confidence': 0.7170071005821228},
{'box': [273, 266, 38, 38], 'confidence': 0.7144962549209595},
{'box': [252, 493, 40, 40], 'confidence': 0.7130072116851807},
{'box': [453, 215, 20, 20], 'confidence': 0.706762969493866},
{'box': [63, 396, 43, 43], 'confidence': 0.7053548693656921},
{'box': [313, 189, 39, 39], 'confidence': 0.7040255069732666},
{'box': [15, 241, 31, 31], 'confidence': 0.6972864866256714},
{'box': [219, 161, 18, 18], 'confidence': 0.6943190693855286},
{'box': [43, 9, 31, 31], 'confidence': 0.6927041411399841},
{'box': [303, 5, 27, 27], 'confidence': 0.6924176812171936},
{'box': [301, 259, 53, 53], 'confidence': 0.6918803453445435},
{'box': [478, 319, 40, 40], 'confidence': 0.6887754201889038},
{'box': [67, 508, 58, 52], 'confidence': 0.6868264079093933},
{'box': [184, 112, 43, 43], 'confidence': 0.6865329742431641},
{'box': [334, 135, 18, 18], 'confidence': 0.6855722069740295},
{'box': [36, 350, 23, 23], 'confidence': 0.6833070516586304},
{'box': [177, 95, 25, 25], 'confidence': 0.6830892562866211},
{'box': [159, 420, 38, 38], 'confidence': 0.682868480682373},
{'box': [318, 138, 19, 19], 'confidence': 0.6816803216934204},
{'box': [263, 423, 29, 29], 'confidence': 0.6813008189201355},
{'box': [284, 199, 20, 20], 'confidence': 0.6787427663803101},
{'box': [67, 352, 21, 21], 'confidence': 0.6717443466186523},
{'box': [481, 23, 74, 74], 'confidence': 0.6704385876655579},
{'box': [523, 452, 31, 31], 'confidence': 0.6700493097305298},
{'box': [243, 334, 76, 76], 'confidence': 0.6653152108192444},
{'box': [454, 338, 29, 29], 'confidence': 0.6650230884552002},
{'box': [49, 95, 22, 22], 'confidence': 0.6635971069335938},
{'box': [321, 84, 55, 55], 'confidence': 0.6603143215179443},
{'box': [480, 325, 31, 31], 'confidence': 0.6586322784423828},
{'box': [294, 135, 24, 24], 'confidence': 0.6576036810874939},
{'box': [60, 347, 39, 39], 'confidence': 0.6554562449455261},
{'box': [458, 406, 21, 21], 'confidence': 0.65467768907547},
{'box': [342, 138, 23, 23], 'confidence': 0.6540101766586304},
{'box': [540, 441, 20, 22], 'confidence': 0.653633713722229},
{'box': [300, 127, 25, 25], 'confidence': 0.6521259546279907},
{'box': [170, 133, 54, 54], 'confidence': 0.6484688520431519},
{'box': [20, 192, 22, 22], 'confidence': 0.644957959651947},
{'box': [518, 296, 28, 28], 'confidence': 0.6440291404724121},
{'box': [245, 522, 43, 38], 'confidence': 0.6340025067329407},
{'box': [436, 367, 58, 58], 'confidence': 0.6332893967628479},
{'box': [234, 233, 108, 108], 'confidence': 0.6274054646492004},
{'box': [28, 85, 53, 53], 'confidence': 0.6244142055511475},
{'box': [254, 502, 30, 30], 'confidence': 0.624413788318634},
{'box': [319, 182, 37, 37], 'confidence': 0.6236416101455688},
{'box': [29, 21, 31, 31], 'confidence': 0.6222331523895264},
{'box': [9, 182, 33, 33], 'confidence': 0.6211090683937073},
{'box': [17, 248, 21, 21], 'confidence': 0.6192639470100403},
{'box': [141, 398, 54, 54], 'confidence': 0.618570864200592},
{'box': [74, 386, 30, 30], 'confidence': 0.6184467673301697},
{'box': [198, 203, 28, 28], 'confidence': 0.6183221936225891},
{'box': [336, 103, 22, 22], 'confidence': 0.6169424653053284},
{'box': [253, 530, 30, 30], 'confidence': 0.6161786317825317},
{'box': [199, 58, 77, 77], 'confidence': 0.6141642332077026},
{'box': [510, 87, 41, 41], 'confidence': 0.6061983704566956},
{'box': [23, 212, 39, 39], 'confidence': 0.6061719655990601},
{'box': [292, 267, 63, 63], 'confidence': 0.605388343334198},
{'box': [446, 25, 112, 112], 'confidence': 0.604427695274353},
{'box': [342, 147, 20, 20], 'confidence': 0.6038945317268372},
{'box': [33, 249, 30, 30], 'confidence': 0.6038562655448914}]
The output of the processing is a set of bounding boxes along with a confidence score. We can see a plot of the output in the following cell:
from mtcnn.utils.plotting import plot
import matplotlib.pyplot as plt
plt.imshow(plot(image, result))
<matplotlib.image.AxesImage at 0x7f537815f550>
As can be seen, the PNet is proposing several bounding boxes, which must be "refined" to discard those that do not fit. This is part of the RNet functionality.
Accessing PNet's model¶
The network can be accessed by instantiating StagePNet and reading the attribute model, which is a TensorFlow model.
stage = StagePNet()
model = stage.model
model.summary()
Model: "p_net_1"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓ ┃ Layer (type) ┃ Output Shape ┃ Param # ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩ │ conv1 (Conv2D) │ (None, None, None, 10) │ 280 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ prelu1 (PReLU) │ (None, None, None, 10) │ 10 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ maxpooling1 (MaxPooling2D) │ (None, None, None, 10) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ conv2 (Conv2D) │ (None, None, None, 16) │ 1,456 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ prelu2 (PReLU) │ (None, None, None, 16) │ 16 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ conv3 (Conv2D) │ (None, None, None, 32) │ 4,640 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ prelu3 (PReLU) │ (None, None, None, 32) │ 32 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ conv4-1 (Conv2D) │ (None, None, None, 4) │ 132 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ conv4-2 (Conv2D) │ (None, None, None, 2) │ 66 │ └─────────────────────────────────┴────────────────────────┴───────────────┘
Total params: 6,632 (25.91 KB)
Trainable params: 6,632 (25.91 KB)
Non-trainable params: 0 (0.00 B)
Loading PNet's weights¶
The model weights are stored within the folder local mtcnn/assets/weights/ under the filename pnet.lz4. It can be loaded with joblib.
import joblib
pnet_weights = joblib.load("../mtcnn/assets/weights/pnet.lz4")
len(pnet_weights)
13
[w.shape for w in pnet_weights]
[(3, 3, 3, 10), (10,), (1, 1, 10), (3, 3, 10, 16), (16,), (1, 1, 16), (3, 3, 16, 32), (32,), (1, 1, 32), (1, 1, 32, 4), (4,), (1, 1, 32, 2), (2,)]
Further stage ablation can be performed by looking at mtcnn/stages/stage_pnet.py