Technology

Setting Up Your Environment for DeepLabV3 Training

In the evolving landscape of computer vision, accurately understanding and segmenting images is crucial for applications ranging from autonomous vehicles to medical diagnostics. Semantic segmentation, a key technique, assigns a class label to every pixel in an image, providing a dense understanding of its contents. This comprehensive guide will walk you through the process of training a powerful DeepLabV3 model with a lightweight MobileNetV2 backbone using TensorFlow, leveraging the efficient TensorFlow Model Garden package.

This tutorial trains a DeepLabV3 with MobileNet V2 as backbone model from the TensorFlow Model Garden package (tensorflow-models). Model Garden contains a collection of state-of-the-art models, implemented with TensorFlow’s high-level APIs. The implementations demonstrate the best practices for modeling, letting users take full advantage of TensorFlow for their research and product development. Our chosen dataset for this demonstration is the Oxford-IIIT Pets dataset, a popular benchmark for image segmentation tasks.

The Oxford-IIIT pet dataset is a 37-category pet image dataset with roughly 200 images for each class. The images have large variations in scale, pose, and lighting. All images have an associated ground truth annotation of breed. This tutorial demonstrates how to use models from the TensorFlow Models package, train/fine-tune a pre-built DeepLabV3 with MobileNet as backbone for Semantic Segmentation, and export the trained/tuned DeepLabV3 model.

Setting Up Your Environment for DeepLabV3 Training

The first step in any machine learning project is to ensure your development environment is correctly configured. This includes installing all necessary dependencies and importing the required libraries. For this TensorFlow Model Garden tutorial, we’ll begin by installing the official TensorFlow Models package.

To install necessary dependencies, simply run:

pip install -U -q "tf-models-official"

Once the installation is complete, we’ll import all the libraries essential for building, training, and evaluating our semantic segmentation model. These include standard Python libraries for file operations and data manipulation, alongside TensorFlow-specific modules from the `tf-models-official` package. This enables us to tap into powerful functionalities for data processing, model construction, and training orchestration.

Import required libraries:

import os
import pprint
import numpy as np
import matplotlib.pyplot as plt

from IPython import display

import tensorflow as tf
import tensorflow_datasets as tfds

import orbit
import tensorflow_models as tfm
from official.vision.data import tfrecord_lib
from official.vision.utils import summary_manager
from official.vision.serving import export_saved_model_lib
from official.vision.utils.object_detection import visualization_utils

pp = pprint.PrettyPrinter(indent=4) # Set Pretty Print Indentation
print(tf.__version__) # Check the version of tensorflow used

%matplotlib inline

Upon execution, you’ll see the TensorFlow version, confirming your environment is ready. (The reported TensorFlow version is 2.15.0).

Custom Dataset Preparation for Semantic Segmentation

Effective training of DeepLabV3 with MobileNetV2 hinges on proper dataset handling. Models in the Official repository (of model-garden) require models in a TFRecords dataformat. TFRecords is a TensorFlow-specific binary storage format optimized for efficient data input pipelines. Please check this resource to learn more about TFRecords data format.

We load the Oxford-IIIT Pets dataset directly from TensorFlow Datasets, splitting it into training, validation, and test sets. The splits are configured as ‘train+test[:50%]’, ‘test[50%:80%]’, and ‘test[80%:100%]’ respectively. This ensures a consistent and reproducible division of our image segmentation data.

(train_ds, val_ds, test_ds), info = tfds.load(
'oxford_iiit_pet:3.*.*',
split=['train+test[:50%]', 'test[50%:80%]', 'test[80%:100%]'],
with_info=True)

The dataset information (`info`) details its structure, including features like `file_name`, `image`, `label`, and crucially, `segmentation_mask`. This mask is our ground truth for semantic segmentation. With the dataset loaded, we need to convert it into the TFRecords format. This involves a helper function to encode each record (image and mask) into a TensorFlow Example protobuf.

Helper function to encode dataset as tfrecords:

def process_record(record):
keys_to_features = {
'image/encoded': tfrecord_lib.convert_to_feature(
tf.io.encode_jpeg(record['image']).numpy()),
'image/height': tfrecord_lib.convert_to_feature(record['image'].shape[0]),
'image/width': tfrecord_lib.convert_to_feature(record['image'].shape[1]),
'image/segmentation/class/encoded':tfrecord_lib.convert_to_feature(
tf.io.encode_png(record['segmentation_mask'] - 1).numpy())
}
example = tf.train.Example(
features=tf.train.Features(feature=keys_to_features))
return example

Next, we write these processed records to a dedicated folder. This organized approach ensures our custom dataset is readily available in the format required by the TensorFlow Model Garden’s DeepLabV3 implementation. Separate TFRecord files are generated for the training, validation, and test splits, often sharded for parallel processing efficiency.

Write TFRecords to a folder:

output_dir = './oxford_iiit_pet_tfrecords/'
LOG_EVERY = 100
if not os.path.exists(output_dir):
os.mkdir(output_dir)

def write_tfrecords(dataset, output_path, num_shards=1):
writers = [
tf.io.TFRecordWriter(
output_path + '-%05d-of-%05d.tfrecord' % (i, num_shards))
for i in range(num_shards)
] for idx, record in enumerate(dataset):
if idx % LOG_EVERY == 0:
print('On image %d', idx)
tf_example = process_record(record)
writers[idx % num_shards].write(tf_example.SerializeToString())

Write training data as TFRecords: output_train_tfrecs = output_dir + 'train'
write_tfrecords(train_ds, output_train_tfrecs, num_shards=10)

Write validation data as TFRecords: output_val_tfrecs = output_dir + 'val'
write_tfrecords(val_ds, output_val_tfrecs, num_shards=5)

Write test data as TFRecords: output_test_tfrecs = output_dir + 'test'
write_tfrecords(test_ds, output_test_tfrecs, num_shards=5)

Configuring the DeepLabV3 MobileNetV2 Model

With our data ready, the next phase involves configuring the DeepLabV3 MobileNetV2 model for our custom dataset. In Model Garden, the collections of parameters that define a model are called configs. Model Garden can create a config based on a known set of parameters via a factory. We utilize the `mnv2_deeplabv3_pascal` experiment configuration, as defined by `tfm.vision.configs.semantic_segmentation.mnv2_deeplabv3_pascal`.

This configuration specifically defines an experiment to train a DeepLabV3 model with MobileNetV2 as its backbone and ASPP (Atrous Spatial Pyramid Pooling) as its decoder. You can explore other alternative experiments by modifying the experiment name argument to the `get_exp_config` function, such as `seg_deeplabv3_pascal` or `mnv2_deeplabv3plus_cityscapes`.

exp_config = tfm.core.exp_factory.get_exp_config('mnv2_deeplabv3_pascal')

Before fine-tuning, it’s beneficial to initialize our model with pre-trained weights. We download a pre-trained checkpoint for DeepLabV3 MobileNetV2, typically from a larger dataset like COCO, which provides a strong starting point for transfer learning.

model_ckpt_path = './model_ckpt/'
if not os.path.exists(model_ckpt_path):
os.mkdir(model_ckpt_path)

!gsutil cp gs://tf_model_garden/cloud/vision-2.0/deeplab/deeplabv3_mobilenetv2_coco/best_ckpt-63.data-00000-of-00001 './model_ckpt/'
!gsutil cp gs://tf_model_garden/cloud/vision-2.0/deeplab/deeplabv3_mobilenetv2_coco/best_ckpt-63.index './model_ckpt/'

We then adjust the model and dataset configurations to align with our specific requirements. This involves setting the number of classes (3 for pets, background, and outlines in Oxford-IIIT), input image size (128×128), and batch size. Crucially, we point to the downloaded checkpoint, freeze the MobileNetV2 backbone for initial fine-tuning, and specify the paths to our TFRecords for training and validation data.

num_classes = 3
WIDTH, HEIGHT = 128, 128
input_size = [HEIGHT, WIDTH, 3]BATCH_SIZE = 16

# Backbone Config
exp_config.task.init_checkpoint = model_ckpt_path + 'best_ckpt-63'
exp_config.task.freeze_backbone = True

# Model Config
exp_config.task.model.num_classes = num_classes
exp_config.task.model.input_size = input_size

# Training Data Config
exp_config.task.train_data.aug_scale_min = 1.0
exp_config.task.train_data.aug_scale_max = 1.0
exp_config.task.train_data.input_path = train_data_tfrecords
exp_config.task.train_data.global_batch_size = BATCH_SIZE
exp_config.task.train_data.dtype = 'float32'
exp_config.task.train_data.output_size = [HEIGHT, WIDTH]exp_config.task.train_data.preserve_aspect_ratio = False
exp_config.task.train_data.seed = 21 # Reproducable Training Data

# Validation Data Config
exp_config.task.validation_data.input_path = val_data_tfrecords
exp_config.task.validation_data.global_batch_size = BATCH_SIZE
exp_config.task.validation_data.dtype = 'float32'
exp_config.task.validation_data.output_size = [HEIGHT, WIDTH]exp_config.task.validation_data.preserve_aspect_ratio = False
exp_config.task.validation_data.groundtruth_padded_size = [HEIGHT, WIDTH]exp_config.task.validation_data.seed = 21 # Reproducable Validation Data
exp_config.task.validation_data.resize_eval_groundtruth = True # To enable validation loss

Finally, we fine-tune the trainer configuration. This involves setting the total training steps (e.g., 2000), along with intervals for summaries, checkpoints, and validation. We also define the learning rate schedule, opting for a cosine decay with a linear warmup, a common practice for stable training in deep learning. Device detection determines if training will run on CPU, GPU, or TPU, informing the appropriate strategy.

train_steps = 2000
exp_config.trainer.steps_per_loop = int(train_ds.__len__().numpy() // BATCH_SIZE)

exp_config.trainer.summary_interval = exp_config.trainer.steps_per_loop
exp_config.trainer.checkpoint_interval = exp_config.trainer.steps_per_loop
exp_config.trainer.validation_interval = exp_config.trainer.steps_per_loop
exp_config.trainer.validation_steps = int(train_ds.__len__().numpy() // BATCH_SIZE)
exp_config.trainer.train_steps = train_steps
exp_config.trainer.optimizer_config.warmup.linear.warmup_steps = exp_config.trainer.steps_per_loop
exp_config.trainer.optimizer_config.learning_rate.type = 'cosine'
exp_config.trainer.optimizer_config.learning_rate.cosine.decay_steps = train_steps
exp_config.trainer.optimizer_config.learning_rate.cosine.initial_learning_rate = 0.1
exp_config.trainer.optimizer_config.warmup.linear.warmup_learning_rate = 0.05

Initiating Training with the Task Object

With the model and dataset configurations finalized, the next crucial step is to set up the distribution strategy. This strategy dictates how TensorFlow utilizes available hardware resources, whether it’s a single CPU, multiple GPUs, or even a TPU cluster, to optimize training performance. For GPU environments, `tf.distribute.MirroredStrategy` is commonly used to distribute training across devices.

if exp_config.runtime.mixed_precision_dtype == tf.float16:
tf.keras.mixed_precision.set_global_policy('mixed_float16')

logical_device_names = [logical_device.name
for logical_device in tf.config.list_logical_devices()]

if 'GPU' in ''.join(logical_device_names):
print('This may be broken in Colab.')
distribution_strategy = tf.distribute.MirroredStrategy()
elif 'TPU' in ''.join(logical_device_names):
print('This may be broken in Colab.')
tf.tpu.experimental.initialize_tpu_system()
tpu = tf.distribute.cluster_resolver.TPUClusterResolver(tpu='/device:TPU_SYSTEM:0')
distribution_strategy = tf.distribute.experimental.TPUStrategy(tpu)
else:
print('Warning: this will be really slow.')
distribution_strategy = tf.distribute.OneDeviceStrategy(logical_device_names[0])

print("Done")

The final piece of the puzzle is to create the Task object (tfm.core.basetask.Task) from the config_definitions.TaskConfig. The Task object has all the methods necessary for building the dataset, building the model, and running training & evaluation. These methods are driven by `tfm.core.train_lib.run_experiment`. By instantiating this task within the scope of our chosen distribution strategy, we prepare the system for robust and distributed DeepLabV3 MobileNetV2 training.

model_dir = './trained_model/'

with distribution_strategy.scope():
task = tfm.core.task_factory.get_task(exp_config.task, logging_dir=model_dir)

With the task object configured, you are now ready to launch the training process, fine-tuning your DeepLabV3 MobileNetV2 model for precise semantic segmentation on your custom dataset. This structured approach, leveraging the TensorFlow Model Garden, streamlines complex computer vision tasks.

Conclusion

Training a DeepLabV3 model with MobileNetV2 as its backbone for semantic segmentation using TensorFlow’s Model Garden offers a powerful and efficient path to advanced computer vision applications. By following the steps outlined — from dependency installation and custom dataset preparation in TFRecords format to precise model and trainer configuration — you can successfully fine-tune a state-of-the-art model for your specific needs.

The flexibility and high-level APIs provided by the TensorFlow ecosystem simplify what could otherwise be a daunting task. Whether you’re working with medical images, autonomous driving data, or simply want to segment objects in everyday photos, this approach equips you with a robust framework. Experiment with different configurations, explore alternative backbones, and deploy your newly trained semantic segmentation model to unlock new possibilities in image understanding.

:::info
Originally published on the TensorFlow website, this article appears here under a new headline and is licensed under CC BY 4.0. Code samples shared under the Apache 2.0 License.
:::

Related Articles

Back to top button