Tutorial 2: Define your own network#
In the second tutorial, we will go over how to define your own network and optimize it in agnapprox
. To keep things simple, we will continue to use the MNIST dataset.
%load_ext autoreload
%autoreload 2
from agnapprox.datamodules import MNIST
dm = MNIST(batch_size=128, num_workers=4)
dm.prepare_data()
dm.setup()
We start by defining an extremely simple Neural Network with two Convolutional and one linear layer. Performance is likely not going to be great which is completely fine because it allows us to keep things simple. We can define our NN like any other network in vanilla PyTorch.
import torch.nn as nn
class TinyMNISTNet(nn.Module):
"""
Defintion of vanilla LeNet5 architecture torch.nn.Module
"""
def __init__(self, num_classes):
super().__init__()
self.conv1 = nn.Sequential(
nn.Conv2d(1, 8, kernel_size=5, stride=1, padding=2, bias=False),
nn.BatchNorm2d(8),
nn.ReLU(),
nn.MaxPool2d(kernel_size=2, stride=2),
)
self.conv2 = nn.Sequential(
nn.Conv2d(8, 16, kernel_size=5, stride=2, padding=0, bias=False),
nn.BatchNorm2d(16),
nn.ReLU(),
nn.MaxPool2d(kernel_size=2, stride=2),
)
self.linear1 = nn.Linear(64, num_classes)
def forward(self, features):
out = self.conv1(features)
out = self.conv2(out)
out = out.reshape(out.size(0), -1)
out = self.linear1(out)
return out
Next, we wrap our network in a class that is derived from agnapprox.nets.ApproxNet
. This adds a few extra features to our network, most importantly:
Let
pytorch-lightning
handle the model trainingTrack model metrics using MLFlow
Handle the different optimizer and scheduler configurations for the different training stages
The conversion of the vanilla Conv2d and Linear layers to approximate/noisy layers is handled by agnapprox internally. After instantiating, the
gather_noisy_modules()
method is called. This method identifies all target layers and replaces them with an upgraded version from thetorchapprox
library. These layer implementation bring additional functionality that implements the different training modes.
The full definition of an ApproxNet
instance looks like this:
from agnapprox.nets import ApproxNet
import torch.optim as optim
class TinyApproxNet(ApproxNet):
"""
Definition of training hyperparameters for
approximate LeNet5
"""
def __init__(self):
super().__init__()
# Instance of our model
self.model = TinyMNISTNet(10)
# Experiment name passed to MLFlow
self.name = "TinyMNISTNet"
# TopK metrics to keep track of
self.topk = (1,)
# Default number of epochs for each of the training stages
# can be overridden by passing 'epochs=...' to the respective training functions
self.epochs = {
"baseline": 5,
"gradient_search": 2,
"qat": 1,
"approx": 3,
}
# Maximum number of GPUs to train on if available
self.num_gpus = 1
# Pass model to agnapprox to identify target layers and upgrade them to noisy/approximate layers
self.gather_noisy_modules()
# Define the respective optimizers, schedulers, learning rates, etc. for each stage
def _baseline_optimizers(self):
optimizer = optim.SGD(
self.parameters(), lr=0.1, momentum=0.9, weight_decay=1e-4
)
scheduler = optim.lr_scheduler.StepLR(optimizer, 3, gamma=0.75)
return [optimizer], [scheduler]
def _qat_optimizers(self):
optimizer = optim.SGD(self.parameters(), lr=1e-3, momentum=0.9)
return [optimizer], []
def _gs_optimizers(self):
return self._qat_optimizers
def _approx_optimizers(self):
optimizer = optim.SGD(self.parameters(), lr=1e-3, momentum=0.9)
scheduler = optim.lr_scheduler.StepLR(optimizer, 2)
return [optimizer], [scheduler]
CUDA not found, running on CPU
After setting up the network like this, we can run the individual training stages, just like we’ve seen in the first tutorial.
model = TinyApproxNet()
model.train_baseline(dm, test=True)
GPU available: False, used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
| Name | Type | Params
---------------------------------------
0 | model | TinyMNISTNet | 4.1 K
---------------------------------------
4.1 K Trainable params
0 Non-trainable params
4.1 K Total params
0.016 Total estimated model params size (MB)
/home/elias/agn-approx/.venv/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py:653: UserWarning: Detected KeyboardInterrupt, attempting graceful shutdown...
rank_zero_warn("Detected KeyboardInterrupt, attempting graceful shutdown...")
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Test metric DataLoader 0
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
test_acc_top1 0.9775999784469604
test_loss 0.07349219918251038
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────