➕

How to create large scale collections quickly and efficiently

Created

Jan 7, 2021 10:06 PM

Tags

Python BindingsSDK

How to create large scale collections quickly and efficiently

Pre-processing and extracting templates from millions of images can be a bottleneck for any deployment. The problem is further exacerbated by the RAM overhead of using CUDA and the lack of threaded engine support in most deep learning frameworks. The following tutorial demonstrates how to create large scale collections efficiently.

This tutorial uses 10,000 1080x1080 sized images, obtained from the FFHQ dataset here. For the sake of simplicity, the tutorial assumes the image names represent the identities. For this tutorial, we will be using the GPU Trueface SDK, Python bindings API. We will take advantage of batch template generation.

Before we get started, here's the spec of the machine we will be running our script on:

CPU:

Intel(R) Core(TM) i7-10870H CPU @ 2.20GHz, 16 cores, 32GB Ram

GPU:

NVIDIA GeForce RTX 3080 Laptop GPU, 16GB VRAM

Let's get started. The following script demonstrates how to use the Trueface SDK to load images, run face detection, then generate face recognition templates in batch, and finally enroll them into a collection. I advise you start by navigating to our benchmarks page to get an understand of the max batch size your GPU can support (this is limited by available GPU ram). For my GPU, it's around 100.

For the sake of the demo, I'll run the following script twice. The first time, setting batch_size = 1, and the second time setting batch_size = 100, in order to demonstrate the speedup resulting from batching.

Batch size = 1

Total time: 327s

Batch size = 100:

Total time: 250s

# Sample code: Generate face recognition templates for images in batch and then enroll them into a collection.

# This sample app demonstrates how you can enroll face recognition templates or Faceprints into a collection on disk in batch.
# We take advantage of GPU batching to increase the template generation throughput.
# First, we create a database and create a new collection within that database.
# Next, we generate face recognition templates in batch and enroll those templates into the collection.
# Note, after running this sample app, you can run the identification_1_n sample app.

# For this sample app, we will assume the image name is the identity we want to enroll. 

import tfsdk
import os
import glob

def generate_feature_vectors(face_chips, face_identities):
    res, faceprints = sdk.get_face_feature_vectors(face_chips)
    if (res != tfsdk.ERRORCODE.NO_ERROR):
        print("Unable to extract face feature vectors")
        quit()

    # Now enroll the faceprints into our collection
    for i in range(len(face_chips)):
        res, UUID = sdk.enroll_faceprint(faceprints[i], face_identities[i])
        if (res != tfsdk.ERRORCODE.NO_ERROR):
            print("Unable to enroll feature vector")
            continue

    print("================================")
    print("Enrolled batch of:", len(face_chips), "Faceprints in collection")
    print("================================")
    face_chips.clear()
    face_identities.clear()

options = tfsdk.ConfigurationOptions()
# Can set configuration options here

options.dbms = tfsdk.DATABASEMANAGEMENTSYSTEM.SQLITE # Save the templates in an SQLITE database

# To use a PostgreSQL database
# options.dbms = tfsdk.DATABASEMANAGEMENTSYSTEM.POSTGRESQL

options.smallest_face_height = 200

options.fr_model = tfsdk.FACIALRECOGNITIONMODEL.TFV5 
# Note, if you do use TFV5, you will need to run the download script in /download_models to obtain the model file

options.enable_GPU = True  # Batching is only supported by GPU

sdk = tfsdk.SDK(options)

# TODO: export your license token as TRUEFACE_TOKEN environment variable
is_valid = sdk.set_license(os.environ['TRUEFACE_TOKEN'])
if (is_valid == False):
    print("Invalid License Provided")
    print("Be sure to export your license token as TRUEFACE_TOKEN")
    quit()

# Create a new database
res = sdk.create_database_connection("my_database.db")
if (res != tfsdk.ERRORCODE.NO_ERROR):
  print("Unable to create database connection")
  quit()

# ex. If using POSTGRESQL backend...
# res = sdk.create_database_connection("host=localhost port=5432 dbname=my_database user=postgres password=admin")
# if (res != tfsdk.ERRORCODE.NO_ERROR):
#   print("Unable to create database connection")
#   quit()

# Create a new collection
res = sdk.create_load_collection("my_collection")
if (res != tfsdk.ERRORCODE.NO_ERROR):
    print("Unable to create collection")
    quit()

# TODO: You should change the following to point to the directory containing your images.
image_paths = glob.glob("/home/cyrus/Downloads/69000/*.png")

face_chips = []
face_identities = []

# TODO: Choose a batch size based on your GPU memory
# Consult our benchmarks page: https://docs.trueface.ai/Benchmarks-0b648f5a0cb84badb6425a12697a15e5
batch_size = 100

img_num = 0
for path in image_paths:
    img_num+=1
    if (img_num % 50 == 0):
        print("Processing image:", img_num, "/", len(image_paths))

    # Detect the largest face, then extract the face chip
    res = sdk.set_image(path)
    if (res != tfsdk.ERRORCODE.NO_ERROR):
        print("Unable to set image at path:", path)
        continue

    # Detect the largest face in the image
    found, faceBoxAndLandmarks = sdk.detect_largest_face()
    if found == False:
        print("No face detected in image:", path)
        continue

    # Get the aligned chip
    face = sdk.extract_aligned_face(faceBoxAndLandmarks)

    # Add the face chip to our array for processing
    face_chips.append(face)
    face_identities.append(os.path.basename(os.path.normpath(path)))

    # Only generate feature vector if we have enough face_chips as our specified batch size
    if len(face_chips) != batch_size:
        continue;

    # Generate feature vectors in batch
    generate_feature_vectors(face_chips, face_identities)


if len(face_chips) > 0:
    # We need to call generate feature vectors one final time to clear the array
    generate_feature_vectors(face_chips, face_identities)