How to create large scale collections quickly and efficiently
Pre-processing and extracting templates from millions of images can be a bottleneck for any deployment. The problem is further exacerbated by the RAM overhead of using CUDA and the lack of threaded engine support in most deep learning frameworks. The following tutorial demonstrates how to create large scale collections efficiently.
This tutorial uses 10,000 1080x1080 sized images, obtained from the FFHQ dataset here. For the sake of simplicity, the tutorial assumes the image names represent the identities. For this tutorial, we will be using the GPU Trueface SDK, Python bindings API. We will take advantage of batch template generation.
Before we get started, here's the spec of the machine we will be running our script on:
CPU:
- Intel(R) Core(TM) i7-10870H CPU @ 2.20GHz, 16 cores, 32GB Ram
GPU:
- NVIDIA GeForce RTX 3080 Laptop GPU, 16GB VRAM
Let's get started. The following script demonstrates how to use the Trueface SDK to load images, run face detection, then generate face recognition templates in batch, and finally enroll them into a collection. I advise you start by navigating to our benchmarks page to get an understand of the max batch size your GPU can support (this is limited by available GPU ram). For my GPU, it's around 100.
For the sake of the demo, I'll run the following script twice. The first time, setting batch_size
= 1, and the second time setting batch_size
= 100, in order to demonstrate the speedup resulting from batching.
Batch size = 1
Total time: 327s
Batch size = 100:
Total time: 250s
# Sample code: Generate face recognition templates for images in batch and then enroll them into a collection.
# This sample app demonstrates how you can enroll face recognition templates or Faceprints into a collection on disk in batch.
# We take advantage of GPU batching to increase the template generation throughput.
# First, we create a database and create a new collection within that database.
# Next, we generate face recognition templates in batch and enroll those templates into the collection.
# Note, after running this sample app, you can run the identification_1_n sample app.
# For this sample app, we will assume the image name is the identity we want to enroll.
import tfsdk
import os
import glob
def generate_feature_vectors(face_chips, face_identities):
res, faceprints = sdk.get_face_feature_vectors(face_chips)
if (res != tfsdk.ERRORCODE.NO_ERROR):
print("Unable to extract face feature vectors")
quit()
# Now enroll the faceprints into our collection
for i in range(len(face_chips)):
res, UUID = sdk.enroll_faceprint(faceprints[i], face_identities[i])
if (res != tfsdk.ERRORCODE.NO_ERROR):
print("Unable to enroll feature vector")
continue
print("================================")
print("Enrolled batch of:", len(face_chips), "Faceprints in collection")
print("================================")
face_chips.clear()
face_identities.clear()
options = tfsdk.ConfigurationOptions()
# Can set configuration options here
options.dbms = tfsdk.DATABASEMANAGEMENTSYSTEM.SQLITE # Save the templates in an SQLITE database
# To use a PostgreSQL database
# options.dbms = tfsdk.DATABASEMANAGEMENTSYSTEM.POSTGRESQL
options.smallest_face_height = 200
options.fr_model = tfsdk.FACIALRECOGNITIONMODEL.TFV5
# Note, if you do use TFV5, you will need to run the download script in /download_models to obtain the model file
options.enable_GPU = True # Batching is only supported by GPU
sdk = tfsdk.SDK(options)
# TODO: export your license token as TRUEFACE_TOKEN environment variable
is_valid = sdk.set_license(os.environ['TRUEFACE_TOKEN'])
if (is_valid == False):
print("Invalid License Provided")
print("Be sure to export your license token as TRUEFACE_TOKEN")
quit()
# Create a new database
res = sdk.create_database_connection("my_database.db")
if (res != tfsdk.ERRORCODE.NO_ERROR):
print("Unable to create database connection")
quit()
# ex. If using POSTGRESQL backend...
# res = sdk.create_database_connection("host=localhost port=5432 dbname=my_database user=postgres password=admin")
# if (res != tfsdk.ERRORCODE.NO_ERROR):
# print("Unable to create database connection")
# quit()
# Create a new collection
res = sdk.create_load_collection("my_collection")
if (res != tfsdk.ERRORCODE.NO_ERROR):
print("Unable to create collection")
quit()
# TODO: You should change the following to point to the directory containing your images.
image_paths = glob.glob("/home/cyrus/Downloads/69000/*.png")
face_chips = []
face_identities = []
# TODO: Choose a batch size based on your GPU memory
# Consult our benchmarks page: https://docs.trueface.ai/Benchmarks-0b648f5a0cb84badb6425a12697a15e5
batch_size = 100
img_num = 0
for path in image_paths:
img_num+=1
if (img_num % 50 == 0):
print("Processing image:", img_num, "/", len(image_paths))
# Detect the largest face, then extract the face chip
res = sdk.set_image(path)
if (res != tfsdk.ERRORCODE.NO_ERROR):
print("Unable to set image at path:", path)
continue
# Detect the largest face in the image
found, faceBoxAndLandmarks = sdk.detect_largest_face()
if found == False:
print("No face detected in image:", path)
continue
# Get the aligned chip
face = sdk.extract_aligned_face(faceBoxAndLandmarks)
# Add the face chip to our array for processing
face_chips.append(face)
face_identities.append(os.path.basename(os.path.normpath(path)))
# Only generate feature vector if we have enough face_chips as our specified batch size
if len(face_chips) != batch_size:
continue;
# Generate feature vectors in batch
generate_feature_vectors(face_chips, face_identities)
if len(face_chips) > 0:
# We need to call generate feature vectors one final time to clear the array
generate_feature_vectors(face_chips, face_identities)