Making your PyTorch models work in environments with only TensorFlow

Converting a Simple Deep Learning Model from PyTorch to TensorFlow

Reference: https://towardsdatascience.com/applied-deep-learning-part-1-artificial-neural-networks-d7834f67a4f6

Introduction

TensorFlow and PyTorch are two of the more popular frameworks out there for deep learning. There are people who prefer TensorFlow for support in terms of deployment, and there are those who prefer PyTorch because of the flexibility in model building and training without the difficulties faced in using TensorFlow. The downside of using PyTorch is that the model built and trained using this framework cannot be deployed into production. (Update in Dec 2019: It is claimed that later versions of PyTorch have better support for deployment, but I believe that is something else to be explored). To address the issue of deploying models built using PyTorch, one solution is to use ONNX (Open Neural Network Exchange).

As explained in ONNX’s About page, ONNX is like a bridge that links the various deep learning frameworks together. To this end, the ONNX tool enables conversion of models from one framework to another. Up to the time of this writing, ONNX is limited to simpler model structures, but there may be further additions later on. This article will illustrate how a simple deep learning model can be converted from PyTorch to TensorFlow.

Installing the necessary packages

To start off, we would need to install PyTorch, TensorFlow, ONNX, and ONNX-TF (the package to convert ONNX models to TensorFlow). If using virtualenv in Linux, you could run the command below (replace tensorflow with tensorflow-gpu if you have NVidia CUDA installed). Do note that as of Dec 2019, ONNX does not work with TensorFlow 2.0 yet, so please take note of the version of the TensorFlow that you install.

source /bin/activate
pip install tensorflow==1.15.0

# For PyTorch, choose one of the following (refer to https://pytorch.org/get-started/locally/ for further details)
pip install torch torchvision # if using CUDA 10.1
pip install torch==1.3.1+cu92 torchvision==0.4.2+cu92 -f https://download.pytorch.org/whl/torch_stable.html # if using CUDA 9.2
pip install torch==1.3.1+cpu torchvision==0.4.2+cpu -f https://download.pytorch.org/whl/torch_stable.html # if using CPU only

pip install onnx

# For onnx-tensorflow, you may want to refer to the installation guide here: https://github.com/onnx/onnx-tensorflow
git clone https://github.com/onnx/onnx-tensorflow.git
cd onnx-tensorflow
pip install -e ..

If using Conda, you may want to run the following commands instead:

conda activate 
conda install -c pytorch pytorch

pip install tensorflow==1.15.0

pip install onnx

# For onnx-tensorflow, you may want to refer to the installation guide here: https://github.com/onnx/onnx-tensorflow
git clone https://github.com/onnx/onnx-tensorflow.git
cd onnx-tensorflow
pip install -e ..

I find that installing TensorFlow, ONNX, and ONNX-TF using pip will ensure that the packages are compatible with one another. It is OK, however, to use other ways of installing the packages, as long as they work properly in your machine.

To test that the packages have been installed correctly, you can run the following commands:

python
import tensorflow as tf
import torch
import onnx
from onnx_tf.backend import prepare

If you do not see any error messages, it means that the packages are installed correctly, and we are good to go.

In this example, I used Jupyter Notebook, but the conversion can also be done in a .py file. To install Jupyter Notebook, you can run one of the following commands:

# Installing Jupyter Notebook via pip
pip install notebook

# Installing Jupyter Notebook via Conda
conda install notebook

Building, training, and evaluating the example model

The next thing to do is to obtain a model in PyTorch that can be used for the conversion. In this example, I generated some simulated data, and use this data for training and evaluating a simple Multilayer Perceptron (MLP) model. The following snippet shows how the installed packages are imported, and how I generated and prepared the data.

import numpy as np

import os
import time
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader
import onnx
from onnx_tf.backend import prepare
import tensorflow as tf

# Generate simulated data
train_size = 8000
test_size = 2000

input_size = 20
hidden_sizes = [50, 50]
output_size = 1
num_classes = 2

X_train = np.random.randn(train_size, input_size).astype(np.float32)
X_test = np.random.randn(test_size, input_size).astype(np.float32)
y_train = np.random.randint(num_classes, size=train_size)
y_test = np.random.randint(num_classes, size=test_size)
print('Shape of X_train:', X_train.shape)
print('Shape of X_train:', X_test.shape)
print('Shape of y_train:', y_train.shape)
print('Shape of y_test:', y_test.shape)

# Define Dataset subclass to facilitate batch training
class SimpleDataset(Dataset):
    def __init__(self, X, y):
        self.X = X
        self.y = y
        
    def __len__(self):
        return len(self.X)
    
    def __getitem__(self, idx):
        return self.X[idx], self.y[idx]

# Create DataLoaders for training and test set, for batch training and evaluation
train_loader = DataLoader(dataset=SimpleDataset(X_train, y_train), batch_size=8, shuffle=True)
test_loader = DataLoader(dataset=SimpleDataset(X_test, y_test), batch_size=8, shuffle=False)

I then created a class for the simple MLP model and defined the layers such that we can specify any number and size of hidden layers. I also defined a binary cross entropy loss and Adam optimizer to be used for the computation of loss and weight updates during training. The following snippet shows this process.

# Build model
class SimpleModel(nn.Module):
    def __init__(self, input_size, hidden_sizes, output_size):
        super(SimpleModel, self).__init__()
        self.input_size = input_size
        self.output_size = output_size
        self.fcs = []  # List of fully connected layers
        in_size = input_size
        
        for i, next_size in enumerate(hidden_sizes):
            fc = nn.Linear(in_features=in_size, out_features=next_size)
            in_size = next_size
            self.__setattr__('fc{}'.format(i), fc)  # set name for each fullly connected layer
            self.fcs.append(fc)
            
        self.last_fc = nn.Linear(in_features=in_size, out_features=output_size)
        
    def forward(self, x):
        for i, fc in enumerate(self.fcs):
            x = fc(x)
            x = nn.ReLU()(x)
        out = self.last_fc(x)
        return nn.Sigmoid()(out)
      
# Set device to be used
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print('Device used:', device)
model_pytorch = SimpleModel(input_size=input_size, hidden_sizes=hidden_sizes, output_size=output_size)
model_pytorch = model_pytorch.to(device)

# Set loss and optimizer
# Set binary cross entropy loss since 2 classes only
criterion = nn.BCELoss()
optimizer = optim.Adam(model_pytorch.parameters(), lr=1e-3)

After building the model and defining the loss and optimizer, I trained the model for 20 epochs using the generated training set, then used the test set for evaluation. The test loss and accuracy of the model was not good, but that does not really matter here, as the main purpose here is to show how to convert a PyTorch model to TensorFlow. The snippet below shows the training and evaluation process.

num_epochs = 20

# Train model
time_start = time.time()

for epoch in range(num_epochs):
    model_pytorch.train()
    
    train_loss_total = 0
    
    for data, target in train_loader:
        data, target = data.to(device), target.float().to(device)
        optimizer.zero_grad()
        output = model_pytorch(data)
        train_loss = criterion(output, target)
        train_loss.backward()
        optimizer.step()
        train_loss_total += train_loss.item() * data.size(0)
        
    print('AI4SME {} completed. Train loss is {:.3f}'.format(epoch + 1, train_loss_total / train_size))
print('Time taken to completed {} epochs: {:.2f} minutes'.format(num_epochs, (time.time() - time_start) / 60))

# Evaluate model
model_pytorch.eval()

test_loss_total = 0
total_num_corrects = 0
threshold = 0.5
time_start = time.time()

for data, target in test_loader:
    data, target = data.to(device), target.float().to(device)
    optimizer.zero_grad()
    output = model_pytorch(data)
    train_loss = criterion(output, target)
    train_loss.backward()
    optimizer.step()
    train_loss_total += train_loss.item() * data.size(0)
    
    pred = (output >= threshold).view_as(target)  # to make pred have same shape as target
    num_correct = torch.sum(pred == target.byte()).item()
    total_num_corrects += num_correct

print('Evaluation completed. Test loss is {:.3f}'.format(test_loss_total / test_size))
print('Test accuracy is {:.3f}'.format(total_num_corrects / test_size))
print('Time taken to complete evaluation: {:.2f} minutes'.format((time.time() - time_start) / 60))

After training and evaluating the model, we would need to save the model, as below:

if not os.path.exists('./models/'):
    os.mkdir('./models/')

torch.save(model_pytorch.state_dict(), './models/model_simple.pt')

Converting the model to TensorFlow

Now, we need to convert the .pt file to a .onnx file using the torch.onnx.exportfunction. There are two things we need to take note here: 1) we need to pass a dummy input through the PyTorch model first before exporting, and 2) the dummy input needs to have the shape (1, dimension(s) of single input). For example, if the single input is an image array with the shape (number of channels, height, width), then the dummy input needs to have the shape (1, number of channels, height, width). The dummy input is needed as an input placeholder for the resulting TensorFlow model). The following snippet shows the process of exporting the PyTorch model in the ONNX format. I included the input and output names as arguments as well to make it easier for inference in TensorFlow.

model_pytorch = SimpleModel(input_size=input_size, hidden_sizes=hidden_sizes, output_size=output_size)
model_pytorch.load_state_dict(torch.load('./models/model_simple.pt'))

# Single pass of dummy variable required
dummy_input = torch.from_numpy(X_test[0].reshape(1, -1)).float().to(device)
dummy_output = model_pytorch(dummy_input)
print(dummy_output)

# Export to ONNX format
torch.onnx.export(model_pytorch, dummy_input, './models/model_simple.onnx', input_names=['input'], output_names=['output'])

After getting the .onnx file, we would need to use the prepare() function in ONNX-TF’s backend module to convert the model from ONNX to TensorFlow.

# Load ONNX model and convert to TensorFlow format
model_onnx = onnx.load('./models/model_simple.onnx')

tf_rep = prepare(model_onnx)

# Print out tensors and placeholders in model (helpful during inference in TensorFlow)
print(tf_rep.tensor_dict)

# Export model as .pb file
tf_rep.export_graph('./models/model_simple.pb')

If you have specified the input and output names in the torch.onnx.export function, you should see the keys ‘input’ and ‘output’ along with their corresponding values, as shown in the snippet below. The names ‘input:0’ and ‘Sigmoid:0’ will be used during inference in TensorFlow.

{'fc0.bias': , 'fc0.weight': , 'fc1.bias': , 'fc1.weight': , 'last_fc.bias': , 'last_fc.weight': , 'input': , '7': , '8': , '9': , '10': , '11': , 'output': }

Doing inference in TensorFlow

Here comes the fun part, which is to see if the resultant TensorFlow model can do inference as intended. Loading a TensorFlow model from a .pb file can be done by defining the following function.

def load_pb(path_to_pb):
    with tf.gfile.GFile(path_to_pb, 'rb') as f:
        graph_def = tf.GraphDef()
        graph_def.ParseFromString(f.read())
    with tf.Graph().as_default() as graph:
        tf.import_graph_def(graph_def, name='')
        return graph

With the function to load the model defined, we need to start a TensorFlow graph session, specify the placeholders for the input and output, and feed an input into the session.

tf_graph = load_pb('./models/model_simple.pb')
sess = tf.Session(graph=tf_graph)

output_tensor = tf_graph.get_tensor_by_name('Sigmoid:0')
input_tensor = tf_graph.get_tensor_by_name('input:0')

output = sess.run(output_tensor, feed_dict={input_tensor: dummy_input})
print(output)

If all goes well, the result of print(output) should match that of print(dummy_output) in the earlier step.

Conclusion

ONNX can be pretty straightforward, provided that your model is not too complicated. The steps in this example would work for deep learning models with single input and output. For models with multiple inputs and/or outputs, it would be more challenging to convert them via ONNX. As such, an example to convert multiple input/output models would have to be done in another article, unless there are new versions of ONNX later on that can handle such models.

The Jupyter notebook containing all the codes can be found here.

This article was originally published here in Towards Data Science.

Author