Setting Up Your Environment for Instance Segmentation

AuthorOctober 15, 2025

2 8 minutes read

In the dynamic world of computer vision, instance segmentation stands out as a powerful technique that not only identifies objects but also precisely outlines each instance within an image. Moving beyond simple bounding boxes, it provides a pixel-level understanding of a scene. If you’re looking to dive into this advanced area, training a Mask R-CNN model is an excellent starting point, especially when leveraging the robust capabilities of TensorFlow Model Garden.

TensorFlow Model Garden is a treasure trove of state-of-the-art machine learning models, meticulously implemented with TensorFlow’s high-level APIs. It serves as an invaluable resource for researchers and developers, showcasing best practices for modeling and enabling you to harness the full potential of TensorFlow for your projects.

This tutorial fine-tunes a Mask R-CNN with Mobilenet V2 as backbone model from the TensorFlow Model Garden package (tensorflow-models). Model Garden contains a collection of state-of-the-art models, implemented with TensorFlow’s high-level APIs. The implementations demonstrate the best practices for modeling, letting users to take full advantage of TensorFlow for their research and product development.

This tutorial demonstrates how to:

Use models from the TensorFlow Models package.
Train/Fine-tune a pre-built Mask R-CNN with mobilenet as backbone for Object Detection and Instance Segmentation
Export the trained/tuned Mask R-CNN model

Our journey will guide you through setting up your environment, preparing a custom dataset, configuring the Mask R-CNN, and initiating the training process for your instance segmentation task.

Setting Up Your Environment for Instance Segmentation

The first step in any deep learning project is ensuring your development environment is correctly configured with all the necessary libraries and tools. For this Mask R-CNN instance segmentation task, we’ll begin by installing TensorFlow Model Garden and other essential packages.

Content Overview

Install Necessary Dependencies
Import required libraries
Download subset of Ivis dataset
Configure the MaskRCNN Resnet FPN COCO model for custom dataset
Create the Task object (tfm.core.basetask.Task) from the configdefinitions.TaskConfig

To get started, execute the following commands in your environment. These ensure you have the core TensorFlow Models package, along with utilities for data handling and image processing.

Install Necessary Dependencies
pip install -U -q "tf-models-official"
pip install -U -q remotezip tqdm opencv-python einops

Once the dependencies are installed, the next crucial step is to import all the required Python libraries. These imports bring in functionalities ranging from file operations and data manipulation to advanced TensorFlow APIs and specialized vision utilities from the TensorFlow Models package.

Import required libraries
import os
import io
import json
import tqdm
import shutil
import pprint
import pathlib
import tempfile
import requests
import collections
import matplotlib
import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt from PIL import Image
from six import BytesIO
from etils import epath
from IPython import display
from urllib.request import urlopen

You might observe some system warnings related to CUDA or cuDNN during initialization. These are often benign messages indicating attempts to register factories that are already present and typically do not hinder execution.

We’ll also need specific TensorFlow Models components for building and training our instance segmentation model effectively, alongside orbit for flexible training loops and tensorflow_datasets for data management.

import orbit
import tensorflow as tf
import tensorflow_models as tfm
import tensorflow_datasets as tfds from official.core import exp_factory
from official.core import config_definitions as cfg
from official.vision.data import tfrecord_lib
from official.vision.serving import export_saved_model_lib
from official.vision.dataloaders.tf_example_decoder import TfExampleDecoder
from official.vision.utils.object_detection import visualization_utils
from official.vision.ops.preprocess_ops import normalize_image, resize_and_crop_image
from official.vision.data.create_coco_tf_record import coco_annotations_to_lists pp = pprint.PrettyPrinter(indent=4) # Set Pretty Print Indentation
print(tf.__version__) # Check the version of tensorflow used %matplotlib inline

Confirming your TensorFlow version is a good practice to ensure compatibility with the Model Garden components. As of this guide, we are using version:

2.15.0

Preparing Your Custom Dataset with LVIS

For our instance segmentation task, we’ll utilize a subset of the LVIS dataset. LVIS, which stands for Large Vocabulary Instance Segmentation, is known for its extensive categories and challenges in detecting rare objects, making it an excellent choice for fine-tuning robust models like Mask R-CNN.

Download subset of lvis dataset
LVIS: A dataset for large vocabulary instance segmentation.

:::tip
Note: LVIS uses the COCO 2017 train, validation, and test image sets. If you have already downloaded the COCO images, you only need to download the LVIS annotations. LVIS val set contains images from COCO 2017 train in addition to the COCO 2017 val split.
:::

First, download the LVIS annotation files. These JSON files contain all the intricate details about the objects and their masks within the images.

# @title Download annotation files wget https://dl.fbaipublicfiles.com/LVIS/lvis_v1_train.json.zip
unzip -q lvis_v1_train.json.zip
rm lvis_v1_train.json.zip wget https://dl.fbaipublicfiles.com/LVIS/lvis_v1_val.json.zip
unzip -q lvis_v1_val.json.zip
rm lvis_v1_val.json.zip wget https://dl.fbaipublicfiles.com/LVIS/lvis_v1_image_info_test_dev.json.zip
unzip -q lvis_v1_image_info_test_dev.json.zip
rm lvis_v1_image_info_test_dev.json.zip

Upon executing these commands, you will see output confirming the successful download and extraction of these large annotation archives, indicating that the dataset files are ready for processing.

To prepare the data for TensorFlow Model Garden, we need to convert these annotations into TFRecords, a standard format for large datasets in TensorFlow. This involves parsing the JSON files and creating helper functions to structure the data correctly, including image paths, bounding boxes, and segmentation masks.

Key helper functions, such as an LvisAnnotation class and a _generate_tf_records function, are employed for this conversion. The LvisAnnotation class handles the complex LVIS JSON structure, while _generate_tf_records iterates through the dataset, fetches images, and converts their annotations into TensorFlow Example format, then writes them into sharded TFRecord files. This process is crucial for efficient data loading during training.

_URLS = { 'train_images': 'http://images.cocodataset.org/zips/train2017.zip', 'validation_images': 'http://images.cocodataset.org/zips/val2017.zip', 'test_images': 'http://images.cocodataset.org/zips/test2017.zip',
} train_prefix = 'train'
valid_prefix = 'val' train_annotation_path = './lvis_v1_train.json'
valid_annotation_path = './lvis_v1_val.json' IMGS_DIR = './lvis_sub_dataset/'
tf_records_dir = './lvis_tfrecords/' if not os.path.exists(IMGS_DIR): os.mkdir(IMGS_DIR) if not os.path.exists(tf_records_dir): os.mkdir(tf_records_dir) NUM_CLASSES = 3
category_index = get_category_map(valid_annotation_path, NUM_CLASSES)
category_ids = list(category_index.keys())

With our helper functions and paths defined, we can now generate the TFRecords for both the training and validation splits of our LVIS subset. The following commands execute the data processing pipeline, downloading images and packaging them with their annotations. You’ll observe progress indicators as the records are generated.

# Below helper function are taken from github tensorflow dataset lvis
# https://github.com/tensorflow/datasets/blob/master/tensorflow_datasets/datasets/lvis/lvis_dataset_builder.py
_generate_tf_records(train_prefix, _URLS['train_images'], train_annotation_path)

_generate_tf_records(valid_prefix, _URLS['validation_images'], valid_annotation_path)

Configuring and Training Your Mask R-CNN Model

With our dataset prepared, the next phase involves configuring the Mask R-CNN model itself. TensorFlow Model Garden simplifies this with predefined experiment configurations, which act as blueprints for various models.

Configure the MaskRCNN Resnet FPN COCO model for custom dataset
train_data_input_path = './lvis_tfrecords/train*'
valid_data_input_path = './lvis_tfrecords/val*'
test_data_input_path = './lvis_tfrecords/test*'
model_dir = './trained_model/'
export_dir ='./exported_model/'

if not os.path.exists(model_dir): os.mkdir(model_dir)

In Model Garden, the collections of parameters that define a model are called configs. Model Garden can create a config based on a known set of parameters via a factory. Use the retinanet_mobilenet_coco experiment configuration, as defined by tfm.vision.configs.maskrcnn.maskrcnn_mobilenet_coco. Please find all the registered experiements here. The configuration defines an experiment to train a Mask R-CNN model with mobilenet as backbone and FPN as decoder. Default Congiguration is trained on COCO train2017 and evaluated on COCO val2017. There are also other alternative experiments available such as maskrcnn_resnetfpn_coco, maskrcnn_spinenet_coco and more. One can switch to them by changing the experiment name argument to the get_exp_config function.

We’ll start by retrieving the configuration for a Mask R-CNN with a MobileNet backbone, designed for COCO-style datasets. This provides a solid foundation for our custom instance segmentation task.

exp_config = exp_factory.get_exp_config('maskrcnn_mobilenet_coco')

Pre-trained checkpoints are invaluable for fine-tuning, as they provide a model that has already learned rich features from a large dataset. We’ll download a MobileNet V2 checkpoint to initialize our backbone.

model_ckpt_path = './model_ckpt/'
if not os.path.exists(model_ckpt_path): os.mkdir(model_ckpt_path) !gsutil cp gs://tf_model_garden/vision/mobilenet/v2_1.0_float/ckpt-180648.data-00000-of-00001 './model_ckpt/'
!gsutil cp gs://tf_model_garden/vision/mobilenet/v2_1.0_float/ckpt-180648.index './model_ckpt/'

Upon completion, you’ll see messages confirming the successful copying of these checkpoint files to your local directory.

Now, we adjust the experiment configuration to align with our custom LVIS dataset. This includes defining batch sizes, image dimensions, the number of classes, and pointing to our generated TFRecords. We also specify that the backbone should be frozen during initial training, leveraging the pre-trained weights.

Adjust the model and dataset configurations so that it works with custom dataset.
BATCH_SIZE = 8
HEIGHT, WIDTH = 256, 256
IMG_SHAPE = [HEIGHT, WIDTH, 3] # Backbone Config
exp_config.task.annotation_file = None
exp_config.task.freeze_backbone = True
exp_config.task.init_checkpoint = "./model_ckpt/ckpt-180648"
exp_config.task.init_checkpoint_modules = "backbone" # Model Config
exp_config.task.model.num_classes = NUM_CLASSES + 1
exp_config.task.model.input_size = IMG_SHAPE # Training Data Config
exp_config.task.train_data.input_path = train_data_input_path
exp_config.task.train_data.dtype = 'float32'
exp_config.task.train_data.global_batch_size = BATCH_SIZE
exp_config.task.train_data.shuffle_buffer_size = 64
exp_config.task.train_data.parser.aug_scale_max = 1.0
exp_config.task.train_data.parser.aug_scale_min = 1.0 # Validation Data Config
exp_config.task.validation_data.input_path = valid_data_input_path
exp_config.task.validation_data.dtype = 'float32'
exp_config.task.validation_data.global_batch_size = BATCH_SIZE

Beyond model parameters, the trainer configuration dictates how the training process itself unfolds. This involves setting the number of training steps, validation intervals, and learning rate schedules crucial for effective deep learning optimization.

Adjust the trainer configuration.
logical_device_names = [logical_device.name for logical_device in tf.config.list_logical_devices()] if 'GPU' in ''.join(logical_device_names): print('This may be broken in Colab.') device = 'GPU'
elif 'TPU' in ''.join(logical_device_names): print('This may be broken in Colab.') device = 'TPU'
else: print('Running on CPU is slow, so only train for a few steps.') device = 'CPU' train_steps = 2000
exp_config.trainer.steps_per_loop = 200 # steps_per_loop = num_of_training_examples // train_batch_size exp_config.trainer.summary_interval = 200
exp_config.trainer.checkpoint_interval = 200
exp_config.trainer.validation_interval = 200
exp_config.trainer.validation_steps = 200 # validation_steps = num_of_validation_examples // eval_batch_size
exp_config.trainer.train_steps = train_steps
exp_config.trainer.optimizer_config.warmup.linear.warmup_steps = 200
exp_config.trainer.optimizer_config.learning_rate.type = 'cosine'
exp_config.trainer.optimizer_config.learning_rate.cosine.decay_steps = train_steps
exp_config.trainer.optimizer_config.learning_rate.cosine.initial_learning_rate = 0.07
exp_config.trainer.optimizer_config.warmup.linear.warmup_learning_rate = 0.05

The system will detect your available hardware, with warnings if running on Colab or CPU (which will be slower), as indicated by the output:

This may be broken in Colab.

To inspect the full, modified configuration before proceeding, you can print the experiment configuration dictionary. This provides a comprehensive overview of all model, data, and training parameters, from runtime settings to specific loss functions and data augmentation strategies. The output is extensive, detailing every aspect.

Print the modified configuration.
pp.pprint(exp_config.as_dict())
display.Javascript("google.colab.output.setIframeHeight('500px');")

Setting up the distribution strategy is crucial for leveraging available hardware, whether it’s GPUs or TPUs, to accelerate training. TensorFlow’s MirroredStrategy or TPUStrategy ensures efficient computation across multiple devices.

# Setting up the Strategy
if exp_config.runtime.mixed_precision_dtype == tf.float16: tf.keras.mixed_precision.set_global_policy('mixed_float16') if 'GPU' in ''.join(logical_device_names): distribution_strategy = tf.distribute.MirroredStrategy()
elif 'TPU' in ''.join(logical_device_names): tf.tpu.experimental.initialize_tpu_system() tpu = tf.distribute.cluster_resolver.TPUClusterResolver(tpu='/device:TPU_SYSTEM:0') distribution_strategy = tf.distribute.experimental.TPUStrategy(tpu)
else: print('Warning: this will be really slow.') distribution_strategy = tf.distribute.OneDeviceStrategy(logical_device_names[0]) print("Done")

With the strategy in place, the system reports the devices being utilized:

INFO:tensorflow:Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0', '/job:localhost/replica:0/task:0/device:GPU:1', '/job:localhost/replica:0/task:0/device:GPU:2', '/job:localhost/replica:0/task:0/device:GPU:3')
Done

Create the Task object (tfm.core.basetask.Task) from the configdefinitions.TaskConfig. The Task object has all the methods necessary for building the dataset, building the model, and running training & evaluation. These methods are driven by tfm.core.train_lib.run_experiment.

with distribution_strategy.scope(): task = tfm.core.task_factory.get_task(exp_config.task, logging_dir=model_dir)

Conclusion

Training a Mask R-CNN for instance segmentation with TensorFlow Model Garden offers a streamlined and powerful approach to advanced computer vision tasks. By following these steps—from setting up your environment and meticulously preparing your custom dataset to configuring the model with pre-trained weights and defining training parameters—you can successfully fine-tune a robust instance segmentation model.

The flexibility and state-of-the-art implementations within Model Garden empower you to tackle complex problems like accurate object detection and pixel-perfect segmentation. This foundation provides a strong starting point for further experimentation, model optimization, and deployment in real-world applications. Dive deeper into the TensorFlow Model Garden documentation to explore more possibilities and elevate your computer vision projects.

:::info
Originally published on the TensorFlow website, this article appears here under a new headline and is licensed under CC BY 4.0. Code samples shared under the Apache 2.0 License.
:::

AuthorOctober 15, 2025

2 8 minutes read