From Classification to Generation - Building a CIFAR10 Classifier with PyTorch

1. Train a classifier using the CIFAR10 dataset.

1.1 Importing Packages

%matplotlib inline

import torch
import torchvision
import torchvision.transforms as transforms

The output of the torchvision dataset is PILImage images in the range [0, 1]. These are then converted to Tensors with a normalized range of [-1, 1].

1.2 Loading Data

!wget https://model-community-picture.obs.cn-north-4.myhuaweicloud.com/ascend-zone/notebook_datasets/0bae26b00e9711f095dcfa163edcddae/data.zip

!unzip data.zip

Load and preprocess the CIFAR-10 dataset, create data loaders for training and testing, and define the class names in the dataset.

transform = transforms.Compose(
    [transforms.ToTensor(),
     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

batch_size = 4

trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
                                        download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=batch_size,
                                          shuffle=True, num_workers=2)

testset = torchvision.datasets.CIFAR10(root='./data', train=False,
                                       download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=batch_size,
                                         shuffle=False, num_workers=2)

classes = ('plane', 'car', 'bird', 'cat',
           'deer', 'dog', 'frog', 'horse', 'ship', 'truck')

Showing some training images

import matplotlib.pyplot as plt 
import numpy as np 
import torchvision 

# Display image function 
def imshow(img): 
    img = img / 2 + 0.5 # unnormalize 
    npimg = img.numpy() 
    plt.imshow(np.transpose(npimg, (1, 2, 0))) 
    plt.show() 

# Get some random training images 
dataiter = iter(trainloader) 
images, labels = next(dataiter) 
# Display images 
imshow(torchvision.utils.make_grid(images)) 

# Print categories 
print(' '.join(f'{classes[labels[j]]:5s}' for j in range(len(labels))))

1.3 Defining a Convolutional Neural Network

import torch.nn as nn
import torch_npu
import torch.nn.functional as F

device = torch.device("npu:0" if torch.npu.is_available() else "cpu")
torch_npu.npu.set_device(device)


class Net(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = torch.flatten(x, 1) # flatten all dimensions except batch
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x


net = Net().to(device)

1.4 Define the loss function and optimizer

Use Classification Cross-Entropy loss and SGD with momentum.

import torch.optim as optim

criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)

1.5 Training the Network

Iterate through the data iterator, feed the input into the network, and optimize it.

for epoch in range(2): # Iterate through the dataset multiple times 
    running_loss = 0.0 
    for i, data in enumerate(trainloader, 0): 
        # Get the input; the data is a list of [inputs, labels] 
        inputs, labels = data[0].to(device), data[1].to(device) 

        # Zero the parameter gradients 
        optimizer.zero_grad() 

        # forward + backward + optimize 
        outputs = net(inputs) 
        loss = criterion(outputs, labels) 
        loss.backward() 
        optimizer.step() 

        # print statistics 
        running_loss += loss.item() 
        if i % 2000 == 1999: # Print every 2000 mini-batch print 
            (f'[{epoch + 1}, {i + 1:5d}] loss: {running_loss / 2000:.3f}') 
            running_loss = 0.0 

print('Finished Training')

out：

[1, 2000] loss: 2.181
[1, 4000] loss: 1.835
[1, 6000] loss: 1.658
[1, 8000] loss: 1.576
[1, 10000] loss: 1.518
[1, 12000] loss: 1.481
[2, 2000] loss: 1.396
[2, 4000] loss: 1.375
[2, 6000] loss: 1.357
[2, 8000] loss: 1.344
[2, 10000] loss: 1.313
[2, 12000] loss: 1.282
Finished Training

Save the trained model:

PATH = './cifar_net.pth'
torch.save(net.state_dict(), PATH)

1.6 Test the network on test data

We’ve just trained the network twice on the training dataset. We need to check if the network has learned anything. This
is done by predicting the class label of the neural network’s output and checking against the true values. If the prediction is correct, add the sample to the correct prediction list.
Show a portion of the images from the test set.

dataiter = iter(testloader)

images, labels = next(dataiter)

imshow(torchvision.utils.make_grid(images))
print(' '.join(f'{classes[labels[j]]:5s}' for j in range(len(labels))))

Reload the saved model (Note: Saving and reloading the model is not necessary here; it’s just to demonstrate how to do it):

net = Net().to(device)
net.load_state_dict(torch.load(PATH))

labels = labels.to(device)
images = images.to(device)

outputs = net(images)

View the prediction results:

_, predicted = torch.max(outputs, 1)

print('Predicted: ', ' '.join(f'{classes[predicted[j]]:5s}'
                              for j in range(4)))

Out:

Predicted: frog ship car ship

Test how the network performs on the entire dataset.

# Test model 
correct = 0 
total = 0 
# Since there is no training, there is no need to calculate the gradient of the output 
with torch.no_grad(): 
    for data in testloader: 
        images, labels = data 
        # Move the data to the NPU device 
        labels = labels.to(device) 
        images = images.to(device) 
        # Calculate the output by inputting the images into the network 
        outputs = net(images) 
        # The predicted class is the highest-scoring class 
        _, predicted = torch.max(outputs.data, 1) 
        total += labels.size(0) 
        correct += (predicted == labels).sum().item() 

print(f'Accuracy of the network on the 10000 test images: {100 * correct // total} %')

Out:

Accuracy of the network on the 10000 test images: 55 %

This seems much better than randomness, which has a 10% accuracy rate (randomly selecting a category from 10 categories). It appears the network has learned something.

Which classes perform well, and which classes perform poorly?

# Prepare to calculate predictions for each class 
correct_pred = {classname: 0 for classname in classes} 
total_pred = {classname: 0 for classname in classes} 


# No gradient needed 
with torch.no_grad(): 
    for data in testloader: 
        images, labels = data 
        images = images.to(device) 
        outputs = net(images) 
        _, predictions = torch.max(outputs, 1) 
        # Get the correct predictions for each class 
        for label, prediction in zip(labels, predictions): 
            if label == prediction: 
                correct_pred[classes[label]] += 1 
            total_pred[classes[label]] += 1 


# Print the accuracy for each class 
for classname, correct_count in correct_pred.items(): 
    accuracy = 100 * float(correct_count) / total_pred[classname] 
    print(f'Accuracy for class: {classname:5s} is {accuracy:.1f} %')

out:

Accuracy for class: plane is 52.9 %
Accuracy for class: car is 68.2 %
Accuracy for class: bird is 32.2 %
Accuracy for class: cat is 44.3 %
Accuracy for class: deer is 48.4 %
Accuracy for class: dog is 41.3 %
Accuracy for class: frog is 69.6 %
Accuracy for class: horse is 55.1 %
Accuracy for class: ship is 69.1 %
Accuracy for class: truck is 69.5 %