Building RESTful APIs with TensorFlow serving

TensorFlow Serving is a powerful tool that allows you to deploy your TensorFlow models as scalable and production-ready RESTful APIs. This means you can easily serve your trained models and make predictions from any client application over HTTP requests. In this article, we will explore how to build RESTful APIs with TensorFlow Serving.

What is a RESTful API?

REST, which stands for Representational State Transfer, is an architectural style that defines a set of constraints to build scalable web services. RESTful APIs (Application Programming Interfaces) adhere to these constraints and enable communication between clients and servers. They use HTTP methods (GET, POST, PUT, DELETE) to perform various operations on resources.

Why use TensorFlow Serving?

TensorFlow Serving provides an efficient and flexible way to serve your machine learning models. It offers high-performance model serving with low latency, making it suitable for real-time applications. TensorFlow Serving also supports model versioning, allowing you to easily roll out new model versions and perform A/B testing. Additionally, it provides built-in monitoring and logging capabilities for better observability.

Setting up TensorFlow Serving

Before we start building RESTful APIs with TensorFlow Serving, we need to set it up. The easiest way to install TensorFlow Serving is via Docker. Run the following command to pull the TensorFlow Serving Docker image:

docker pull tensorflow/serving

Next, you can start a TensorFlow Serving container by running:

docker run -p 8501:8501 --name=your_container_name --mount type=bind,source=/path/to/your/model/directory,target=/models/your_model_name -e MODEL_NAME=your_model_name -t tensorflow/serving

Replace your_container_name with the desired name for your container and /path/to/your/model/directory with the path to your TensorFlow SavedModel directory.

Creating a RESTful API with TensorFlow Serving

Once TensorFlow Serving is up and running, we can start building our RESTful API. There are several frameworks available for building APIs, such as Flask or Django. In this example, we will use Flask due to its simplicity and ease of use.

First, let's install Flask by running:

pip install flask

Now, let's create a new Python script named and import the necessary modules:

from flask import Flask, request, jsonify
import requests

Next, we instantiate the Flask application and define a route for our API endpoint:

app = Flask(__name__)

@app.route('/predict', methods=['POST'])
def predict():
    # Retrieve the input data from the request
    data = request.get_json()

    # Make a POST request to TensorFlow Serving
    response ='http://localhost:8501/v1/models/your_model_name:predict',
                             json={'instances': [data]})

    # Retrieve the predictions from the response
    predictions = response.json()['predictions']

    # Return the predictions as a JSON response
    return jsonify(predictions=predictions)

In the predict function, we extract the input data from the POST request, make a request to TensorFlow Serving using the requests library, and retrieve the predictions. Finally, we return the predictions as a JSON response.

To run the Flask application, execute the following command:


Now, you can send POST requests to http://localhost:5000/predict with your input data and receive the predictions from TensorFlow Serving.


In this article, we learned how to build RESTful APIs with TensorFlow Serving. TensorFlow Serving provides a robust and scalable solution for deploying TensorFlow models as production-ready APIs. By combining TensorFlow Serving with frameworks like Flask, you can easily create powerful machine learning APIs. Now, you can serve your models and make predictions in real-time, opening up endless possibilities for integrating machine learning into your applications.

noob to master © copyleft