Trueface SDK Benchmarks
A typical 1 to N face recognition pipeline involves the following steps:
- Preprocess Image
- Face and landmark detection
- Face recognition template extraction
- 1 to N identification search
To estimate the total pipeline time, add up the time for each individual step. You may also need to account for the time taken to decode your video stream.
If you chose to run face recognition on every face in your image, you will need repeat steps 3 and 4 for every detected face.
Trueface SDK Benchmarks: CPU
Operation | Ram Req. | i9-11900H | AWS c7g.2xlarge (ARM) | AWS c6i.2xlarge (x86) | RPi 4, AArch64 | Android (Snapdragon 8 Gen 1) | iPad Pro M1 |
Preprocess Image | - | 3.9 ms | 6.0 ms | 5.3 ms | 18.5 ms | ||
Face and landmark detection | 30 Mb | 4.2 ms | 6.2 ms | 7.2 ms | 79.6 ms | 4 ms | 6.5 ms |
106 face landmark detection | - | 1.7 ms | 2.7 ms | 2.5 ms | 20.2 ms | ||
Head orientation detection | - | 0.01 ms | 0.015 ms | 0.015 ms | 0.055 ms | ||
Mask detection | 385 Mb | 10.3 ms | 10.0 ms | 12.2 ms | 166.6 ms | ||
Blink detection | 60 Mb | 4.4 ms | 8.6 ms | 7.2 ms | 46.2 ms | ||
Passive spoof detection | 500 Mb | 22 ms | 30 ms | 35.4 ms | 311.7 ms | ||
Object detection | 190 Mb | 35.6 ms | 28.6 ms | 61 ms | 496.5 ms | ||
Face recognition template extraction, LITE model | 73 Mb | 10.9 ms | 6.6 ms | 15.5 ms | 72.8 ms | 56 ms | 4.5 ms |
Face recognition template extraction, TFV5 model | 2 Gb | 48 ms | 40.0 ms | 61.5 ms | 1152.3 ms | 766 ms | 57.6 ms |
Face recognition template extraction, TFV6 model | 2 Gb | 48 ms | 40.0 ms | 62.7 ms | 1173.5 ms | 261 ms | 57.3 ms |
1 to N identification search (N = 1,000,000) TFV5 | 2250 bytes / template | 48.6 ms | 16.8 ms | 47.9 ms | 571.6 ms | ||
1 to N batch identification search (N = 1,000,000) TFV5, vector compression | 1250 bytes / template | 9.4 ms | 9.3 ms | 17.2 ms | 180.8 ms |
1 to N identification search times scale linearly, so for a collection of size 10,000, divide the above reported times by 100.
All benchmarks performed using 1280x720 pixel images containing 1 face or object with CPU only and smallest face height set to 40 px. Ram usage refers to maximum resident memory. Batch identification tested with 100 probe templates, and is used to increase throughput. Enrollment template size represents conservative average case, but it can be variable due to variable length in identity string. TFV5
and FULL
models have same inference time. All benchmarks run on Ubuntu 20.04.
Trueface SDK Benchmarks: GPU
Operation | VRAM Usage | RTX 3080 Laptop GPU | AWS g4dn.xlarge (T4) |
Preprocess Image | - | 5.3 ms | 8.1 ms |
Face and landmark detection | 1.9 Gb | 2.7 ms | 4.9 ms |
Object Detection | 1 Gb | 4.2 ms | 7.7 ms |
Face recognition template extraction TFV5, batch size = 1 | 2.73 Gb | 8.7 ms | 13.1 ms |
Face recognition template extraction TFV5, batch size = 4 | 3.22 Gb | 4.1 ms | 6.9 ms |
Face recognition template extraction TFV5, batch size = 16 | 4.91 Gb | 2.6 ms | 4.7 ms |
Face recognition template extraction TFV5, batch size = 32 | 6.85 Gb | 2.3 ms | 4.3 ms |
Face recognition template extraction TFV6, batch size = 1 | 1.5 Gb | 2.2 ms | 3.9 ms |
Face recognition template extraction TFV6, batch size = 4 | 1.5 Gb | 1.1 ms | 2.0 ms |
Face recognition template extraction TFV6, batch size = 16 | 1.5 Gb | 0.93 ms | 1.4 ms |
Face recognition template extraction TFV6, batch size = 32 | 1.5 Gb | 0.85 ms | 1.2 ms |
Mask Detection, batch size = 1 | 1.3 Gb | 0.87 ms | 1.2 ms |
Mask Detection, batch size = 4 | 1.3 Gb | 0.23 ms | 0.37 ms |
1 to N identification search is performed on CPU only, so refer to CPU times.
How are batching times reported:
With a batch size of 4, we generate 4 face recognition templates at the same time. The total time taken to generate those templates is 16.4 ms, meaning the average time per template is 4.1 ms.
With a batch size of 64, we generate 64 face recognition templates at the same time. The total time taken to generate those templates is 136.96 ms, meaning the average time per template is 2.14 ms.
Operations which do not show a batch size only support a batch size of 1 at this time.
All benchmarks performed using 1280x720 pixel images containing 1 face or object with GPU enabled and smallest face height set to 40 px. GPU benchmarks use FP16 inference where applicable (TFV6).
Trueface On-Prem Benchmarks
Operation | Ram Req. | AWS c6i.2xlarge (CPU) | AWS g4dn.xlarge (GPU) |
Templatize Face | 2.1 Gb | 90 ms | 43 ms |
Match Faces | 2.1 Gb | 190 ms | 87 ms |
Enroll Face | 2.1 Gb | 88 ms | 45 ms |
Identify Face (N = 1000) | 2.1 Gb | 85 ms | 44 ms |
Spoof Detection | 2.5 Gb | 37 ms | 75 ms |
All benchmarks performed using 1280x720 px jpg image on disk using default smallest face height of 100. Requests were sent from the same machine running the PTOP server in order to avoid any network overhead in the measurements.