初学Pytorch

前言

早在2017年的时候就想学深度学习框架TensorFlow。那时候TensorFlow火得不了，经常看公司的博士老哥用TensorFlow做着各种高大上的
项目，操着最高端的显卡，做着最炫酷的事情。对神经网络一窍不通的我，只能在旁边站着围观着。那时候我心里想自己一定得抽个时间学学
TensorFlow这玩意，虽然当时做的事情用不上算法，更用不上神经网络、深度学习，但这技术太火了，还是想学下，技多不压身麻。

但是，由于自己对机器学习这玩意不是很感冒（实际是数学功底太差了），学了一段时间机器学习那些东西，比如打打kaggle等等。但自己的数学
还是太差了，连那种所谓的入门西瓜书的公式都看不懂。玩不下去，学着也得不到啥正反馈，后面还是放弃了。

2021年了，今年偶尔上了一个机器学习的网课，才又把当年的那些东西捡起来。这课程要求使用深度学习框架做项目，要么选TensorFlow，要么选
Pytorch。我不知道选哪个，在知乎上搜了下，得出的结论是：Pytorch最近的发展势头要比TensorFlow强劲。
工业界目前TensorFlow占主流，不过，未来可能会被Pytorch超越。在学术界上，Pytorch
领先TensorFlow的。另外，Pytorch对Python比较友好。所以，有了今天这篇文章。我选择了Pytorch。

Pytorch基础

Pytorch Tensor vs Numpy Array

在Pytorch中，matrix(array)叫做tensors
33 矩阵，就是一个33tensor
其实这和numpy差不多的：

运行下一下代码：

# import numpy library
import numpy as np
# numpy array
array = [[1,2,3],[4,5,6]]
first_array = np.array(array) # 2x3 array
print("Array Type: {}".format(type(first_array))) # type
print("Array Shape: {}".format(np.shape(first_array))) # shape
print(first_array)

# import pytorch library
import torch
# pytorch array
tensor = torch.Tensor(array)
print("Array Type: {}".format(tensor.type)) # type
print("Array Shape: {}".format(tensor.shape)) # shape
print(tensor)

有些方法numpy 和 pytorch的效果是一样的：

1 2	np.ones() = torch.ones() np.random.rand() = torch.rand()

# numpy ones
print("Numpy {}\n".format(np.ones((2,3))))
# pytorch ones
print(torch.ones((2,3)))

对于矩阵的构造和操作，感觉用numpy的话可能会更好些。所以，通常会将神经网络的tensor结果转换为numpy array来查看和显示。
可以用如下方法相互转换：

1 2	torch.from_numpy(): from numpy to tensor numpy(): from tensor to numpy

下面是一些数学操作

tensor的加减乘除法：

# create tensor 
tensor = torch.ones(3,3)
print("\n",tensor)
# Resize
print("{}{}\n".format(tensor.view(9).shape,tensor.view(9)))
# Addition
print("Addition: {}\n".format(torch.add(tensor,tensor)))
# Subtraction
print("Subtraction: {}\n".format(tensor.sub(tensor)))
# Element wise multiplication
print("Element wise multiplication: {}\n".format(torch.mul(tensor,tensor)))
# Element wise division
print("Element wise division: {}\n".format(torch.div(tensor,tensor)))
# Mean
tensor = torch.Tensor([1,2,3,4,5])
print("Mean: {}".format(tensor.mean()))
# Standart deviation (std)
print("std: {}".format(tensor.std()))

Variables

Variables 加速 gradient
在神经网络gradient计算时，我们可以用backpropagation, 所以我们需要处理gradients
variables 和 tensor的区别是 variable累积 gradients
我们可以同样用variable做加减乘除运算
为了做backward propagation我们需要 variables

# import variable from pytorch library
from torch.autograd import Variable
# define variable
var = Variable(torch.ones(3), requires_grad = True)
var
tensor([1., 1., 1.], requires_grad=True)

假设我们有等式 y = x^2
定义 x = [2,4] variable
计算后我们得到 y = [4,16] (y = x^2)
概括 o 的等式： o = (1/2)sum(y) = (1/2)sum(x^2)
o 的导数是 x
结果等于 x, 所以gradients是[2,4]

# lets make basic backward propagation
# we have an equation that is y = x^2
array = [2,4]
tensor = torch.Tensor(array)
x = Variable(tensor, requires_grad = True)
y = x**2
print(" y =  ",y)
# recap o equation o = 1/2*sum(y)
o = (1/2)*sum(y)
print(" o =  ",o)
# backward
o.backward() # calculates gradients
# As I defined, variables accumulates gradients. In this part there is only one variable x.
# Therefore variable x should be have gradients
# Lets look at gradients with x.grad
print("gradients: ",x.grad)

Logistic Regression

我们看一个数字识别的案例。

步骤如下：

导入Library
准备数据集

用MNIST dataset.
28*28 images， 10 labels 从 0 到 9
数据没有正则化，所以我们要将每个数据除以255. 255是图片的基本正则数值。
为了split数据，我们用sklearn的 train_test_split
80% train data; 20% test data
建立feature和目标tensors. 在下部分，我们通过这些tensors创建variable。另外，我们需要为gradients累积定义variable。
batch_size意味着比如：我们有1000sample数据，我们可以同时训练1000个 sample，或者，可以把它分成10个group，每个group 100个sample，按顺序训练10个group。batch_size就是group size。
epoch: 一个epoch意味着一次训练所有的samples
实验中，我们有33600 个sample去训练，batch_size定为100. 我们决定epoch设为29（accuracy到29epoch时候已经很高了）。数据被训练29次。问题是需要多少次迭代呢？
训练一次 = 训练33600个sample
但是我们split数据336个group（group_size=batch_size=100)，所以1epoch需要336次迭代。我们有29个epoch，所以一共要迭代9744次。
TensorDataset(): 包装tensors的数据集
DataLoader(): 包含dataset和sample
Visualize one of the images in dataset

创建 Logistic Regression模型
实例模型

input_dim = 28 * 28 # size of image pxpx
output_dim = 10 # labels 0,1,2,3,4,5,6,7,8,9
create model

实例 Loss

Cross entropy loss

实例 Optimizer

SGD Optimizer

Traning the Model
Prediction

代码如下：

# Prepare Dataset
# load data
train = pd.read_csv(r"../input/train.csv",dtype = np.float32)
# split data into features(pixels) and labels(numbers from 0 to 9)
targets_numpy = train.label.values
features_numpy = train.loc[:,train.columns != "label"].values/255 # normalization
# train test split. Size of train data is 80% and size of test data is 20%. 
features_train, features_test, targets_train, targets_test = train_test_split(features_numpy,
                                                                             targets_numpy,
                                                                             test_size = 0.2,
                                                                             random_state = 42) 
# create feature and targets tensor for train set. As you remember we need variable to accumulate gradients. Therefore first we create tensor, then we will create variable
featuresTrain = torch.from_numpy(features_train)
targetsTrain = torch.from_numpy(targets_train).type(torch.LongTensor) # data type is long
# create feature and targets tensor for test set.
featuresTest = torch.from_numpy(features_test)
targetsTest = torch.from_numpy(targets_test).type(torch.LongTensor) # data type is long
# batch_size, epoch and iteration
batch_size = 100
n_iters = 10000
num_epochs = n_iters / (len(features_train) / batch_size)
num_epochs = int(num_epochs)
# Pytorch train and test sets
train = torch.utils.data.TensorDataset(featuresTrain,targetsTrain)
test = torch.utils.data.TensorDataset(featuresTest,targetsTest)
# data loader
train_loader = DataLoader(train, batch_size = batch_size, shuffle = False)
test_loader = DataLoader(test, batch_size = batch_size, shuffle = False)
# visualize one of the images in data set
plt.imshow(features_numpy[10].reshape(28,28))
plt.axis("off")
plt.title(str(targets_numpy[10]))
plt.savefig('graph.png')
plt.show()

# Create Logistic Regression Model
class LogisticRegressionModel(nn.Module):
    def __init__(self, input_dim, output_dim):
        super(LogisticRegressionModel, self).__init__()
        # Linear part
        self.linear = nn.Linear(input_dim, output_dim)
        # There should be logistic function right?
        # However logistic function in pytorch is in loss function
        # So actually we do not forget to put it, it is only at next parts
    
    def forward(self, x):
        out = self.linear(x)
        return out
# Instantiate Model Class
input_dim = 28*28 # size of image px*px
output_dim = 10  # labels 0,1,2,3,4,5,6,7,8,9
# create logistic regression model
model = LogisticRegressionModel(input_dim, output_dim)
# Cross Entropy Loss  
error = nn.CrossEntropyLoss()
# SGD Optimizer 
learning_rate = 0.001
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

Training part

# Traning the Model
count = 0
loss_list = []
iteration_list = []
for epoch in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):
        
        # Define variables
        train = Variable(images.view(-1, 28*28))
        labels = Variable(labels)
        
        # Clear gradients
        optimizer.zero_grad()
        
        # Forward propagation
        outputs = model(train)
        
        # Calculate softmax and cross entropy loss
        loss = error(outputs, labels)
        
        # Calculate gradients
        loss.backward()
        
        # Update parameters
        optimizer.step()
        
        count += 1
        
        # Prediction
        if count % 50 == 0:
            # Calculate Accuracy         
            correct = 0
            total = 0
            # Predict test dataset
            for images, labels in test_loader: 
                test = Variable(images.view(-1, 28*28))
                
                # Forward propagation
                outputs = model(test)
                
                # Get predictions from the maximum value
                predicted = torch.max(outputs.data, 1)[1]
                
                # Total number of labels
                total += len(labels)
                
                # Total correct predictions
                correct += (predicted == labels).sum()
            
            accuracy = 100 * correct / float(total)
            
            # store loss and iteration
            loss_list.append(loss.data)
            iteration_list.append(count)
        if count % 100 == 0:
            # Print Loss
            print('Iteration: {}  Loss: {}  Accuracy: {}%'.format(count, loss.data, accuracy))

Artificial Neural Network (ANN)

LR分类可以，但是当复杂度（非线性）增加，准确性就降低了。
所以我们需要增加模型的复杂度。为了增加模型复杂性，我们增加更多非线性方法作为hidden layer。

我们所期待的是，当复杂性增加，我用更多的hidden layer，从而我们的模型准确性会更高。

ANN步骤：

Import Libraries
Prepare Dataset
完全和上一部分一样
用同样的dataset，只需要train_loader and test_loader.
batch size, epoch 和迭代次数也一样。
创建 ANN Model

增加3层hidden layer
用 ReLU, Tanh and ELU 激活函数。

Instantiate Model Class

input_dim = 2828 # size of image pxpx
output_dim = 10 # labels 0,1,2,3,4,5,6,7,8,9
hidden layer维度是150，取这个数是随便取的。你可以尝试其他的数值
create model

Instantiate Loss

Cross entropy loss
It also has softmax(logistic function) in it.

Instantiate Optimizer

SGD Optimizer

Traning the Model
Prediction

有了hidden layers 模型的准确率达到了95%。比原来高了很多。

# Create ANN Model
class ANNModel(nn.Module):
    
    def __init__(self, input_dim, hidden_dim, output_dim):
        super(ANNModel, self).__init__()
        
        # Linear function 1: 784 --> 150
        self.fc1 = nn.Linear(input_dim, hidden_dim) 
        # Non-linearity 1
        self.relu1 = nn.ReLU()
        
        # Linear function 2: 150 --> 150
        self.fc2 = nn.Linear(hidden_dim, hidden_dim)
        # Non-linearity 2
        self.tanh2 = nn.Tanh()
        
        # Linear function 3: 150 --> 150
        self.fc3 = nn.Linear(hidden_dim, hidden_dim)
        # Non-linearity 3
        self.elu3 = nn.ELU()
        
        # Linear function 4 (readout): 150 --> 10
        self.fc4 = nn.Linear(hidden_dim, output_dim)  
    
    def forward(self, x):
        # Linear function 1
        out = self.fc1(x)
        # Non-linearity 1
        out = self.relu1(out)
        
        # Linear function 2
        out = self.fc2(out)
        # Non-linearity 2
        out = self.tanh2(out)
        
        # Linear function 2
        out = self.fc3(out)
        # Non-linearity 2
        out = self.elu3(out)
        
        # Linear function 4 (readout)
        out = self.fc4(out)
        return out
# instantiate ANN
input_dim = 28*28
hidden_dim = 150 #hidden layer dim is one of the hyper parameter and it should be chosen and tuned. For now I only say 150 there is no reason.
output_dim = 10
# Create ANN
model = ANNModel(input_dim, hidden_dim, output_dim)
# Cross Entropy Loss 
error = nn.CrossEntropyLoss()
# SGD Optimizer
learning_rate = 0.02
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

# ANN model training
count = 0
loss_list = []
iteration_list = []
accuracy_list = []
for epoch in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):
        train = Variable(images.view(-1, 28*28))
        labels = Variable(labels)
        
        # Clear gradients
        optimizer.zero_grad()
        
        # Forward propagation
        outputs = model(train)
        
        # Calculate softmax and ross entropy loss
        loss = error(outputs, labels)
        
        # Calculating gradients
        loss.backward()
        
        # Update parameters
        optimizer.step()
        
        count += 1
        
        if count % 50 == 0:
            # Calculate Accuracy         
            correct = 0
            total = 0
            # Predict test dataset
            for images, labels in test_loader:
                test = Variable(images.view(-1, 28*28))
                
                # Forward propagation
                outputs = model(test)
                
                # Get predictions from the maximum value
                predicted = torch.max(outputs.data, 1)[1]
                
                # Total number of labels
                total += len(labels)
                # Total correct predictions
                correct += (predicted == labels).sum()
            
            accuracy = 100 * correct / float(total)
            
            # store loss and iteration
            loss_list.append(loss.data)
            iteration_list.append(count)
            accuracy_list.append(accuracy)
        if count % 500 == 0:
            # Print Loss
            print('Iteration: {}  Loss: {}  Accuracy: {} %'.format(count, loss.data, accuracy))

Convolutional Neural Network (CNN)

CNN非常适合图像分类

CNN步骤：

Import Libraries
Prepare Dataset

和前面部分完全相同
We use same dataset so we only need train_loader and test_loader.

Convolutional layer:

根据filters（kernels)创建feature
Padding: 在实施filter后，原始图片的维度减少。然而，我们要保留尽可能多的原始图片信息，我们可以在convolutional layer 后，运用padding去增加dimension of feature map。
我们用2层convolutional的
out_channels是16
Filter(kernel) size is 5*5

Pooling layer:

为convolutional layer(feature map) output准备一个精简的feature map
使用2个pooling layer
Pooling size是： 2*2

Flattening: Flats the features map
Fully Connected Layer:

可以是logistic regression，但要以softmax function结尾。
我们不会在FC层使用激活函数
我们结合convolutional part 和 logistic regression去创建CNN模型

Instantiate Model Class

create model

Instantiate Loss

Cross entropy loss
It also has softmax(logistic function) in it.

Instantiate Optimizer

SGD Optimizer

Traning the Model
Prediction
convolutional layer准确率高达98%

# Create CNN Model
class CNNModel(nn.Module):
    def __init__(self):
        super(CNNModel, self).__init__()
        
        # Convolution 1
        self.cnn1 = nn.Conv2d(in_channels=1, out_channels=16, kernel_size=5, stride=1, padding=0)
        self.relu1 = nn.ReLU()
        
        # Max pool 1
        self.maxpool1 = nn.MaxPool2d(kernel_size=2)
     
        # Convolution 2
        self.cnn2 = nn.Conv2d(in_channels=16, out_channels=32, kernel_size=5, stride=1, padding=0)
        self.relu2 = nn.ReLU()
        
        # Max pool 2
        self.maxpool2 = nn.MaxPool2d(kernel_size=2)
        
        # Fully connected 1
        self.fc1 = nn.Linear(32 * 4 * 4, 10) 
    
    def forward(self, x):
        # Convolution 1
        out = self.cnn1(x)
        out = self.relu1(out)
        
        # Max pool 1
        out = self.maxpool1(out)
        
        # Convolution 2 
        out = self.cnn2(out)
        out = self.relu2(out)
        
        # Max pool 2 
        out = self.maxpool2(out)
        
        # flatten
        out = out.view(out.size(0), -1)
        # Linear function (readout)
        out = self.fc1(out)
        
        return out
# batch_size, epoch and iteration
batch_size = 100
n_iters = 2500
num_epochs = n_iters / (len(features_train) / batch_size)
num_epochs = int(num_epochs)
# Pytorch train and test sets
train = torch.utils.data.TensorDataset(featuresTrain,targetsTrain)
test = torch.utils.data.TensorDataset(featuresTest,targetsTest)
# data loader
train_loader = torch.utils.data.DataLoader(train, batch_size = batch_size, shuffle = False)
test_loader = torch.utils.data.DataLoader(test, batch_size = batch_size, shuffle = False)
    
# Create CNN
model = CNNModel()
# Cross Entropy Loss 
error = nn.CrossEntropyLoss()
# SGD Optimizer
learning_rate = 0.1
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

Reference

https://www.kaggle.com/kanncaa1/pytorch-tutorial-for-deep-learning-lovers