⏱️

Benchmarks

A typical 1 to N face recognition pipeline involves the following steps:

  1. Preprocess Image
  2. Face and landmark detection
  3. Face recognition template extraction
  4. 1 to N identification search

To estimate the total pipeline time, add up the time for each individual step. You may also need to account for the time taken to decode your video stream.

SDK Benchmarks - CPU

OperationRam Req.Speed i9-11900HSpeed i7-10870HSpeed Dual Xeon E5-2630 v4Speed Jetson Xavier NX, 15W 6 coreSpeed Jetson Xavier NX, 10W 4 coreSpeed Jetson Nano, MaxSpeed RPi 4, AArch64Android (Snapdragon 8 Gen 1 / 12 GB RAM)
Preprocess Image
-
3.93 ms
5.1 ms
Object detection
60 MB
6.9 ms
8.7 ms
13 ms
30 ms
37.6 ms
62 ms
85 ms
Face and landmark detection
30 MB
4.2 ms
5.8 ms
10 ms
32 ms
43 ms
50 ms
70 ms
106 face landmark detection
1.9 ms
Head orientation detection
0.01 ms
Face recognition template extraction, LITE model
25 MB
2.6 ms
3.4 ms
5.3 ms
20 ms
28 ms
21 ms
35 ms
14.1 ms
Face recognition template extraction, LITE V2 model
73 MB
10.9 ms
13.2 ms
21.6 ms
34 ms
41 ms
56 ms
80 ms
12.9 ms
Face recognition template extraction, TFV5 model
2 GB
48 ms
71 ms
67 ms
205 ms
280 ms
575 ms
1200 ms
152.5 ms
Face recognition template extraction, TFV6 model
2 GB
48 ms
71 ms
67 ms
205 ms
280 ms
575 ms
1200 ms
265.4 ms
1 to N identification search (N = 1,000,000) TFV5
2250 bytes / template
48.6 ms
56.82 ms
44.46 ms
60.49 ms
89.28 ms
-
559.85 ms
1 to N identification search (N = 1,000,000) TFV5, vector compression
1250 bytes / template
25.3 ms
30.97 ms
25.37 ms
36.74 ms
57.06 ms
175.52 ms
293.29 ms
1 to N batch identification search (N = 1,000,000) TFV5
2250 bytes / template
19.04 ms
16.91 ms
22.41 ms
62.24 ms
90.50 ms
-
350.70 ms
1 to N batch identification search (N = 1,000,000) TFV5, vector compression
1250 bytes / template
9.4 ms
8.73 ms
11.71 ms
34.62 ms
57.84 ms
146.68 ms
191.99 ms
Mask detection
385 MB
13.1 ms
Blink detection
60 MB
4.4 ms
Passive spoof detection
500 MB
8.3 ms

1 to N identification search times scale linearly, so for a collection of size 10,000, divide the above reported times by 100.

All benchmarks performed using 1280x720 pixel images containing 1 face or object with CPU only and smallest face height set to 40 px. Ram usage refers to maximum resident memory. 1 to N identification search time scales linearly with collection size, and also scales linearly with number of available cores. Batch identification tested with 100 probe templates, and is used to increase throughput. Enrollment template size represents conservative average case, but it can be variable due to variable length in identity string. TFV5 and FULL models have same inference time. x86_64 benchmarks run on Ubuntu 20.04. NVIDIA Jetson benchmarks run on Ubuntu 18.04. Raspberry Pi 4 benchmarks run on Gentoo v1.6.0.

SDK Benchmarks - GPU

OperationVRAM UsageSpeed RTX 3080 LaptopSpeed GTX 1070Speed Jetson Xavier NX, 15W 6 coreSpeed Jetson Xavier NX, 10W 4 coreSpeed Jetson Nano, Max
Preprocess Image
-
5.3 ms
Face and landmark detection
1.9 Gb
4.6 ms
17.5 ms
19.9 ms
20.4 ms
52 ms
Face recognition template extraction TFV5, batch size = 1
2.73 Gb
8.7 ms
11.6 ms
70 ms
80 ms
250 ms
Face recognition template extraction TFV5, batch size = 2
2.89 Gb
5.2 ms
7.9 ms
-
-
-
Face recognition template extraction TFV5, batch size = 4
3.22 Gb
4.1 ms
6.4 ms
-
-
-
Face recognition template extraction TFV5, batch size = 8
3.78 Gb
3.25 ms
5.7 ms
-
-
-
Face recognition template extraction TFV5, batch size = 16
4.91 Gb
2.62 ms
5.1 ms
-
-
-
Face recognition template extraction TFV5, batch size = 32
6.85 Gb
2.25 ms
6.3 ms
-
-
-
Face recognition template extraction TFV6, batch size = 1
1.5 Gb
2.20 ms
-
15.36 ms
23.49 ms
-
Face recognition template extraction TFV6, batch size = 2
1.5 Gb
1.44 ms
-
-
-
-
Face recognition template extraction TFV6, batch size = 4
1.5 Gb
1.14 ms
-
10.51 ms
15.37 ms
-
Face recognition template extraction TFV6, batch size = 8
1.5 Gb
1.04 ms
-
8.75 ms
10.95 ms
-
Face recognition template extraction TFV6, batch size = 16
1.5 Gb
0.93 ms
-
-
-
-
Face recognition template extraction TFV6, batch size = 32
1.6 Gb
0.85 ms
-
-
-
-
Mask Detection, batch size = 1
1.3 Gb
0.87 ms
Mask Detection, batch size = 4
1.3 Gb
0.23 ms

1 to N identification search is performed on CPU only, so refer to CPU times.

Speed refer to average time per input.

Average Time per Input Meaning:

With a batch size of 4, we generate 4 face recognition templates at the same time. The total time taken to generate those templates is 16.4 ms, meaning the average time per template is 4.1 ms.

With a batch size of 64, we generate 64 face recognition templates at the same time. The total time taken to generate those templates is 136.96 ms, meaning the average time per template is 2.14 ms.

All benchmarks performed using 1280x720 pixel images containing 1 face or object with GPU enabled and smallest face height set to 40 px. OMP_NUM_THREADS was set to 2 to limit any CPU processing. Batching does not increase throughput on NVIDIA Jetson as the device draws too much current and is automatically throttled. GPU benchmarks use FP16 inference where applicable (TFV6).