The Eager way to building deep learning models

Simple Neural Network Model Using TensorFlow Eager Execution

Introduction

Eager Execution is a nifty approach in TensorFlow (TF) to build deep learning models from scratch. It allows you to build prototype models without the hassles that come with the graphical approach that TF uses conventionally.

For example, with Eager Execution, there is no need to start a graph session in order to perform tensor computations. This means faster debugging, as you could check each line of computation on-the-fly without needing to wrap the computation in a graph session.

As a disclaimer, however, using Eager Execution requires some knowledge on the matrix algebra concepts used in deep learning, particularly on how forward passes are done in a neural network. If you are looking for something more high-level and ready for use, I would advise using the Keras API in TF or PyTorch instead.

This article will provide an example of how Eager Execution can be used, by describing the procedure to build, train and evaluate a simple Multilayer Perceptron.


Architecture and Notations

The neural network built in this example consists of an input layer, one hidden layer, and an output layer. The input layer contains 3 nodes, the hidden layer 20 nodes, and the output layer has 1 node. The output value is continuous (i.e. the neural network performs regression).

The values of the input, hidden and output layers, as well as the weights between the layers, can be expressed as matrices. The biases to the hidden and output layers can be expressed as vectors (a special case of matrices with one row or column). The image below shows the dimensions for each of the matrices and vectors.


Notations and dimensions for matrices and vectors

Beginning Eager Execution

After importing the dependencies needed for this example (mainly NumPy and TF), you would need to enable Eager Execution if you are not using TF 2.0. The code snippet below shows how Eager Execution can be enabled.

import numpy as np

import time
import tensorflow as tf
import tensorflow.contrib.eager as tfe
# Enable Eager Execution (must be done before using Eager Execution)
tf.enable_eager_execution()
# Method to check if Eager Execution is enabled
tf.executing_eagerly()

Preparing the Data for Training and Evaluation

The next step is to randomly generate some data for use in training and evaluation (for illustration purposes of course), by using NumPy’s random module. With this approach, I created two separate sets of data, one for training and the other for evaluation.

Each set of data contained 1 input array and 1 output array. The input array was in the shape (number of observations, number of features), while the output array was in the shape (number of observations, number of output values per observation). The number of features corresponds to the number of nodes in the input layer, while the number of output values per observation corresponds to the number of nodes in the output layer.

After generating the data, I split the test data into batches, for more efficient evaluation. The train data will also be split into batches, but done during the training process itself.

The snippet below shows how I prepared the data.

# Define size of input, hidden and output layers
size_input = 3
size_hidden = 20
size_output = 1

X_train = np.random.randn(800, size_input)
X_test = np.random.randn(200, size_input)
y_train = np.random.randn(800)
y_test = np.random.randn(200)

# Split test dataset into batches (training dataset will be randomly split into batches during each epoch of model training)
test_ds = tf.data.Dataset.from_tensor_slices((X_test, y_test)).batch(4)

Building the Model

What I did here was to create a Python class that stores the codes responsible for weight and bias initialization, forward pass, backpropagation and updates to weights and biases.

The weights and biases were initialized by sampling random values from a standard normal distribution. Random initialization of weights is typically preferred over initializing the weights with the value 0 or 1, in order to reduce the chance of getting issues such as vanishing gradients.

The forward pass can be described by the following equations. The relu() represents the Rectified Linear Unit function, which transforms the linear combination of inputs and biases in a non-linear way. There is no transform function for the equation of the output Y, as a continuous value is expected as the output. As a side note, a non-linear transform function such as sigmoid or softmax would be needed in the second equation if the output is expected to be categorical.


Matrix algebra for the forward pass

The backpropagation of loss and updates of weights and biases are taken care with a few lines of codes (in the loss() and backward() methods of the model class respectively).

The rather long snippet below shows how the model building process can be implemented in a class. The additional compute_output() method is a wrapper over the forward pass algorithm, to facilitate the user in terms of selection of hardware device (CPU or GPU) for model training and evaluation.

# Define class to build model
class Model(object):
  def __init__(self, size_input, size_hidden, size_output, device=None):
    """
    size_input: int, size of input layer
    size_hidden: int, size of hidden layer
    size_output: int, size of output layer
    device: str or None, either 'cpu' or 'gpu' or None. If None, the device to be used will be decided automatically during Eager Execution
    """
    self.size_input, self.size_hidden, self.size_output, self.device =
    size_input, size_hidden, size_output, device
    
    # Initialize weights between input layer and hidden layer
    self.W_xh = tfe.Variable(tf.random_normal([self.size_input, self.size_hidden]))
    # Initialize weights between hidden layer and output layer
    self.W_hy = tfe.Variable(tf.random_normal([self.size_hidden, self.size_output]))
    # Initialize biases for hidden layer
    self.b_h = tfe.Variable(tf.random_normal([1, self.size_hidden]))
    # Initialize biases for output layer
    self.b_y = tfe.Variable(tf.random_normal([1, self.size_output]))
    
    # Define variables to be updated during backpropagation
    self.variables = [self.W_xh, self.W_hy, self.b_h, self.b_y]
    
  def forward(self, X):
    """
    Method to do forward pass
    X: Tensor, inputs
    """
    if self.device is not None:
      with tf.device('gpu:0' if self.device=='gpu' else 'cpu'):
        self.y = self.compute_output(X)
    else:
      # Leave choice of device to default
      self.y = self.compute_output(X)
      
    return self.y
  
  def loss(self, y_pred, y_true):
    '''
    Method to do backpropagation of loss
    y_pred - Tensor of shape (batch_size, size_output)
    y_true - Tensor of shape (batch_size, size_output)
    '''
    y_true_tf = tf.cast(tf.reshape(y_true, (-1, self.size_output)), dtype=tf.float32)
    # Cast y_pred to float32
    y_pred_tf = tf.cast(y_pred, dtype=tf.float32)
    return tf.losses.mean_squared_error(y_true_tf, y_pred_tf)
  
  def backward(self, X_train, y_train):
    optimizer = tf.train.GradientDescentOptimizer(learning_rate=1e-4)
    with tf.GradientTape() as tape:
      predicted = self.forward(X_train)
      current_loss = self.loss(predicted, y_train)
    grads = tape.gradient(current_loss, self.variables)
    optimizer.apply_gradients(zip(grads, self.variables),
                              global_step=tf.train.get_or_create_global_step())
#     print('Loss: {:.3f}'.format(self.loss(self.forward(X_train), y_train)))
        
        
  def compute_output(self, X):
    """
    Custom method to obtain output tensor during forward pass
    """
    # Cast X to float32
    X_tf = tf.cast(X, dtype=tf.float32)
    # Compute values in hidden layer
    a = tf.matmul(X_tf, self.W_xh) + self.b_h
    l_h = tf.nn.relu(a)
    # Compute output
    output = tf.matmul(l_h, self.W_hy) + self.b_y
    return output

You may have noticed the function tf.cast() used in the class. Reason being, there is this weird error that is triggered because the from_tensor_slices() method from the earlier snippet returns tensors in the tf.float64 data format, but the matrix operations (e.g. tf.matmul()) can only handle tensors in the tf.float32 data format. I have not tried doing Eager Execution on TF 2.0, so I am not sure if this issue has already been addressed in this new version. What I do know is that this issue of data format definitely occurs in the version of TF that I used for this example (i.e. 1.31.1), so this is something to take note of if using Eager Execution on older versions of TF.


Training the Model

After preparing the data and building the model, the next step is to train the model. Model training is pretty simple, with only a few lines of codes needed. The basic idea here is to repeat the following for each batch of data for every epoch: feed the input tensor through the model to get the prediction tensor, compute the loss, backpropagate the loss, and update the weights and biases. During every epoch, the training data will be split randomly into different batches, to increase the computational efficiency of model training and help the model generalize better. The following snippet illustrates how training can be done with Eager Execution.

# Set number of epochs
NUM_EPOCHS = 5

# Initialize model, letting device be selected by default during Eager Execution
model_default = Model(size_input, size_hidden, size_output)

time_start = time.time()
for epoch in range(NUM_EPOCHS):
  loss_total = tfe.Variable(0, dtype=tf.float32)
  train_ds = tf.data.Dataset.from_tensor_slices((X_train, y_train)).shuffle(10, seed=epoch).batch(4)
  for inputs, outputs in train_ds:
    preds = model_default.forward(inputs)
    loss_total = loss_total + model_default.loss(preds, outputs)
    model_default.backward(inputs, outputs)
  print('AI4SME {} - Average MSE: {:.4f}'.format(epoch + 1, loss_total.numpy() / X_train.shape[0]))
time_taken = time.time() - time_start

print('nTotal time taken for training (secoonds): {:.2f}'.format(time_taken))

Evaluating the Model

The final step is to evaluate the model using the test set. The codes to do that are similar to that of training but without the backpropagation and updates of weights and biases.

test_loss_total = tfe.Variable(0, dtype=tf.float32)
for inputs, outputs in test_ds:
  preds = model_default.forward(inputs)
  test_loss_total = test_loss_total + model_default.loss(preds, outputs)
print('Average Test MSE: {:.4f}'.format(test_loss_total.numpy() / X_train.shape[0]))

Conclusion

While Eager Execution is pretty straightforward to use, I would like to emphasize again that it is a low-level approach. I would advise against using Eager Execution, unless: 1) you are doing work that requires you to build a deep learning model from scratch (e.g. research/academic work on deep learning models), 2) you are trying to understand the mathematical stuff that is going on behind deep learning, or 3) you just like to build things from scratch.

Having said that though, I think Eager Execution is a pretty good approach in terms of helping you understand a bit better on what actually happens when we do deep learning, without having to juggle with complicated graphs or the other confusing stuff that come with the conventional TF approach.

The Google Colab notebook that I created for this example can be found here.

This article was originally published here in Towards Data Science.

Author