Tutorial 2: Define your own network#

In the second tutorial, we will go over how to define your own network and optimize it in agnapprox. To keep things simple, we will continue to use the MNIST dataset.

%load_ext autoreload
%autoreload 2

from agnapprox.datamodules import MNIST

dm = MNIST(batch_size=128, num_workers=4)
dm.prepare_data()
dm.setup()

We start by defining an extremely simple Neural Network with two Convolutional and one linear layer. Performance is likely not going to be great which is completely fine because it allows us to keep things simple. We can define our NN like any other network in vanilla PyTorch.

import torch.nn as nn

class TinyMNISTNet(nn.Module):
    """
    Defintion of vanilla LeNet5 architecture torch.nn.Module
    """

    def __init__(self, num_classes):
        super().__init__()
        self.conv1 = nn.Sequential(
            nn.Conv2d(1, 8, kernel_size=5, stride=1, padding=2, bias=False),
            nn.BatchNorm2d(8),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2),
        )
        self.conv2 = nn.Sequential(
            nn.Conv2d(8, 16, kernel_size=5, stride=2, padding=0, bias=False),
            nn.BatchNorm2d(16),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2),
        )
        self.linear1 = nn.Linear(64, num_classes)

    def forward(self, features):
        out = self.conv1(features)
        out = self.conv2(out)
        out = out.reshape(out.size(0), -1)
        out = self.linear1(out)
        return out

Next, we wrap our network in a class that is derived from agnapprox.nets.ApproxNet. This adds a few extra features to our network, most importantly:

Let pytorch-lightning handle the model training
Track model metrics using MLFlow
Handle the different optimizer and scheduler configurations for the different training stages
The conversion of the vanilla Conv2d and Linear layers to approximate/noisy layers is handled by agnapprox internally. After instantiating, the gather_noisy_modules() method is called. This method identifies all target layers and replaces them with an upgraded version from the torchapprox library. These layer implementation bring additional functionality that implements the different training modes.

The full definition of an ApproxNet instance looks like this:

from agnapprox.nets import ApproxNet
import torch.optim as optim

class TinyApproxNet(ApproxNet):
    """
    Definition of training hyperparameters for
    approximate LeNet5
    """

    def __init__(self):
        super().__init__()
        # Instance of our model
        self.model = TinyMNISTNet(10)
        # Experiment name passed to MLFlow
        self.name = "TinyMNISTNet"
        # TopK metrics to keep track of
        self.topk = (1,)
        # Default number of epochs for each of the training stages
        # can be overridden by passing 'epochs=...' to the respective training functions
        self.epochs = {
            "baseline": 5,
            "gradient_search": 2,
            "qat": 1,
            "approx": 3,
        }
        # Maximum number of GPUs to train on if available
        self.num_gpus = 1
        # Pass model to agnapprox to identify target layers and upgrade them to noisy/approximate layers
        self.gather_noisy_modules()

    # Define the respective optimizers, schedulers, learning rates, etc. for each stage
    def _baseline_optimizers(self):
        optimizer = optim.SGD(
            self.parameters(), lr=0.1, momentum=0.9, weight_decay=1e-4
        )
        scheduler = optim.lr_scheduler.StepLR(optimizer, 3, gamma=0.75)
        return [optimizer], [scheduler]

    def _qat_optimizers(self):
        optimizer = optim.SGD(self.parameters(), lr=1e-3, momentum=0.9)
        return [optimizer], []

    def _gs_optimizers(self):
        return self._qat_optimizers
 
    def _approx_optimizers(self):
        optimizer = optim.SGD(self.parameters(), lr=1e-3, momentum=0.9)
        scheduler = optim.lr_scheduler.StepLR(optimizer, 2)
        return [optimizer], [scheduler]

CUDA not found, running on CPU

After setting up the network like this, we can run the individual training stages, just like we’ve seen in the first tutorial.

model = TinyApproxNet()
model.train_baseline(dm, test=True)

GPU available: False, used: False

TPU available: False, using: 0 TPU cores

IPU available: False, using: 0 IPUs

HPU available: False, using: 0 HPUs

  | Name  | Type         | Params
---------------------------------------
0 | model | TinyMNISTNet | 4.1 K 
---------------------------------------
4.1 K     Trainable params
0         Non-trainable params
4.1 K     Total params
0.016     Total estimated model params size (MB)

/home/elias/agn-approx/.venv/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py:653: UserWarning: Detected KeyboardInterrupt, attempting graceful shutdown...
  rank_zero_warn("Detected KeyboardInterrupt, attempting graceful shutdown...")

────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
       Test metric             DataLoader 0
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
      test_acc_top1         0.9775999784469604
        test_loss           0.07349219918251038
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────