初学Pytorch

前言

早在2017年的时候就想学深度学习框架TensorFlow。那时候TensorFlow火得不了,经常看公司的博士老哥用TensorFlow做着各种高大上的
项目,操着最高端的显卡,做着最炫酷的事情。对神经网络一窍不通的我,只能在旁边站着围观着。那时候我心里想自己一定得抽个时间学学
TensorFlow这玩意,虽然当时做的事情用不上算法,更用不上神经网络、深度学习,但这技术太火了,还是想学下,技多不压身麻。

但是,由于自己对机器学习这玩意不是很感冒(实际是数学功底太差了),学了一段时间机器学习那些东西,比如打打kaggle等等。但自己的数学
还是太差了,连那种所谓的入门西瓜书的公式都看不懂。玩不下去,学着也得不到啥正反馈,后面还是放弃了。

2021年了,今年偶尔上了一个机器学习的网课,才又把当年的那些东西捡起来。这课程要求使用深度学习框架做项目,要么选TensorFlow,要么选
Pytorch。我不知道选哪个,在知乎上搜了下,得出的结论是:Pytorch最近的发展势头要比TensorFlow强劲。
工业界目前TensorFlow占主流,不过,未来可能会被Pytorch超越。在学术界上,Pytorch
领先TensorFlow的。另外,Pytorch对Python比较友好。所以,有了今天这篇文章。我选择了Pytorch。

Pytorch基础

Pytorch Tensor vs Numpy Array

  • 在Pytorch中,matrix(array)叫做tensors
    33 矩阵,就是一个33tensor
    其实这和numpy差不多的:

运行下一下代码:

1
2
3
4
5
6
7
8
9
# import numpy library
import numpy as np
# numpy array
array = [[1,2,3],[4,5,6]]
first_array = np.array(array) # 2x3 array
print("Array Type: {}".format(type(first_array))) # type
print("Array Shape: {}".format(np.shape(first_array))) # shape
print(first_array)

1
2
3
4
5
6
7
8
# import pytorch library
import torch
# pytorch array
tensor = torch.Tensor(array)
print("Array Type: {}".format(tensor.type)) # type
print("Array Shape: {}".format(tensor.shape)) # shape
print(tensor)
  • 有些方法numpy 和 pytorch的效果是一样的:
    1
    2
    np.ones() = torch.ones()
    np.random.rand() = torch.rand()
1
2
3
4
5
# numpy ones
print("Numpy {}\n".format(np.ones((2,3))))
# pytorch ones
print(torch.ones((2,3)))

对于矩阵的构造和操作,感觉用numpy的话可能会更好些。所以,通常会将神经网络的tensor结果转换为numpy array来查看和显示。
可以用如下方法相互转换:

1
2
torch.from_numpy(): from numpy to tensor
numpy(): from tensor to numpy

下面是一些数学操作

tensor的加减乘除法:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
# create tensor
tensor = torch.ones(3,3)
print("\n",tensor)
# Resize
print("{}{}\n".format(tensor.view(9).shape,tensor.view(9)))
# Addition
print("Addition: {}\n".format(torch.add(tensor,tensor)))
# Subtraction
print("Subtraction: {}\n".format(tensor.sub(tensor)))
# Element wise multiplication
print("Element wise multiplication: {}\n".format(torch.mul(tensor,tensor)))
# Element wise division
print("Element wise division: {}\n".format(torch.div(tensor,tensor)))
# Mean
tensor = torch.Tensor([1,2,3,4,5])
print("Mean: {}".format(tensor.mean()))
# Standart deviation (std)
print("std: {}".format(tensor.std()))

Variables

  • Variables 加速 gradient
  • 在神经网络gradient计算时,我们可以用backpropagation, 所以我们需要处理gradients
  • variables 和 tensor的区别是 variable累积 gradients
  • 我们可以同样用variable做加减乘除运算
  • 为了做backward propagation我们需要 variables
1
2
3
4
5
6
7
# import variable from pytorch library
from torch.autograd import Variable
# define variable
var = Variable(torch.ones(3), requires_grad = True)
var
tensor([1., 1., 1.], requires_grad=True)

假设我们有等式 y = x^2
定义 x = [2,4] variable
计算后我们得到 y = [4,16] (y = x^2)
概括 o 的等式: o = (1/2)sum(y) = (1/2)sum(x^2)
o 的导数是 x
结果等于 x, 所以gradients是[2,4]

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
# lets make basic backward propagation
# we have an equation that is y = x^2
array = [2,4]
tensor = torch.Tensor(array)
x = Variable(tensor, requires_grad = True)
y = x**2
print(" y = ",y)
# recap o equation o = 1/2*sum(y)
o = (1/2)*sum(y)
print(" o = ",o)
# backward
o.backward() # calculates gradients
# As I defined, variables accumulates gradients. In this part there is only one variable x.
# Therefore variable x should be have gradients
# Lets look at gradients with x.grad
print("gradients: ",x.grad)

Logistic Regression

我们看一个数字识别的案例。

  • 步骤如下:
  1. 导入Library
  2. 准备数据集
  • 用MNIST dataset.
  • 28*28 images, 10 labels 从 0 到 9
  • 数据没有正则化,所以我们要将每个数据除以255. 255是图片的基本正则数值。
  • 为了split数据,我们用sklearn的 train_test_split
  • 80% train data; 20% test data
  • 建立feature和目标tensors. 在下部分,我们通过这些tensors创建variable。另外,我们需要为gradients累积定义variable。
  • batch_size意味着比如:我们有1000sample数据,我们可以同时训练1000个 sample,或者,可以把它分成10个group,每个group 100个sample,按顺序训练10个group。batch_size就是group size。
  • epoch: 一个epoch意味着一次训练所有的samples
  • 实验中,我们有33600 个sample去训练,batch_size定为100. 我们决定epoch设为29(accuracy到29epoch时候已经很高了)。数据被训练29次。问题是需要多少次迭代呢?
    训练一次 = 训练33600个sample
    但是我们split数据336个group(group_size=batch_size=100),所以1epoch需要336次迭代。我们有29个epoch,所以一共要迭代9744次。
  • TensorDataset(): 包装tensors的数据集
  • DataLoader(): 包含dataset和sample
  • Visualize one of the images in dataset
  1. 创建 Logistic Regression模型

  2. 实例模型

  • input_dim = 28 * 28 # size of image pxpx
  • output_dim = 10 # labels 0,1,2,3,4,5,6,7,8,9
  • create model
  1. 实例 Loss
  • Cross entropy loss
  1. 实例 Optimizer
  • SGD Optimizer
  1. Traning the Model
  2. Prediction

代码如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
# Prepare Dataset
# load data
train = pd.read_csv(r"../input/train.csv",dtype = np.float32)
# split data into features(pixels) and labels(numbers from 0 to 9)
targets_numpy = train.label.values
features_numpy = train.loc[:,train.columns != "label"].values/255 # normalization
# train test split. Size of train data is 80% and size of test data is 20%.
features_train, features_test, targets_train, targets_test = train_test_split(features_numpy,
targets_numpy,
test_size = 0.2,
random_state = 42)
# create feature and targets tensor for train set. As you remember we need variable to accumulate gradients. Therefore first we create tensor, then we will create variable
featuresTrain = torch.from_numpy(features_train)
targetsTrain = torch.from_numpy(targets_train).type(torch.LongTensor) # data type is long
# create feature and targets tensor for test set.
featuresTest = torch.from_numpy(features_test)
targetsTest = torch.from_numpy(targets_test).type(torch.LongTensor) # data type is long
# batch_size, epoch and iteration
batch_size = 100
n_iters = 10000
num_epochs = n_iters / (len(features_train) / batch_size)
num_epochs = int(num_epochs)
# Pytorch train and test sets
train = torch.utils.data.TensorDataset(featuresTrain,targetsTrain)
test = torch.utils.data.TensorDataset(featuresTest,targetsTest)
# data loader
train_loader = DataLoader(train, batch_size = batch_size, shuffle = False)
test_loader = DataLoader(test, batch_size = batch_size, shuffle = False)
# visualize one of the images in data set
plt.imshow(features_numpy[10].reshape(28,28))
plt.axis("off")
plt.title(str(targets_numpy[10]))
plt.savefig('graph.png')
plt.show()

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
# Create Logistic Regression Model
class LogisticRegressionModel(nn.Module):
def __init__(self, input_dim, output_dim):
super(LogisticRegressionModel, self).__init__()
# Linear part
self.linear = nn.Linear(input_dim, output_dim)
# There should be logistic function right?
# However logistic function in pytorch is in loss function
# So actually we do not forget to put it, it is only at next parts
def forward(self, x):
out = self.linear(x)
return out
# Instantiate Model Class
input_dim = 28*28 # size of image px*px
output_dim = 10 # labels 0,1,2,3,4,5,6,7,8,9
# create logistic regression model
model = LogisticRegressionModel(input_dim, output_dim)
# Cross Entropy Loss
error = nn.CrossEntropyLoss()
# SGD Optimizer
learning_rate = 0.001
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)
  • Training part
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    # Traning the Model
    count = 0
    loss_list = []
    iteration_list = []
    for epoch in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):
    # Define variables
    train = Variable(images.view(-1, 28*28))
    labels = Variable(labels)
    # Clear gradients
    optimizer.zero_grad()
    # Forward propagation
    outputs = model(train)
    # Calculate softmax and cross entropy loss
    loss = error(outputs, labels)
    # Calculate gradients
    loss.backward()
    # Update parameters
    optimizer.step()
    count += 1
    # Prediction
    if count % 50 == 0:
    # Calculate Accuracy
    correct = 0
    total = 0
    # Predict test dataset
    for images, labels in test_loader:
    test = Variable(images.view(-1, 28*28))
    # Forward propagation
    outputs = model(test)
    # Get predictions from the maximum value
    predicted = torch.max(outputs.data, 1)[1]
    # Total number of labels
    total += len(labels)
    # Total correct predictions
    correct += (predicted == labels).sum()
    accuracy = 100 * correct / float(total)
    # store loss and iteration
    loss_list.append(loss.data)
    iteration_list.append(count)
    if count % 100 == 0:
    # Print Loss
    print('Iteration: {} Loss: {} Accuracy: {}%'.format(count, loss.data, accuracy))

Artificial Neural Network (ANN)

LR分类可以,但是当复杂度(非线性)增加,准确性就降低了。
所以我们需要增加模型的复杂度。为了增加模型复杂性,我们增加更多非线性方法作为hidden layer。

我们所期待的是,当复杂性增加,我用更多的hidden layer,从而我们的模型准确性会更高。

ANN步骤:

  1. Import Libraries

  2. Prepare Dataset
    完全和上一部分一样
    用同样的dataset,只需要train_loader and test_loader.
    batch size, epoch 和 迭代次数也一样。

  3. 创建 ANN Model

  • 增加3层hidden layer
  • 用 ReLU, Tanh and ELU 激活函数。
  1. Instantiate Model Class
  • input_dim = 2828 # size of image pxpx
  • output_dim = 10 # labels 0,1,2,3,4,5,6,7,8,9
  • hidden layer维度是150,取这个数是随便取的。你可以尝试其他的数值
  • create model
  1. Instantiate Loss
  • Cross entropy loss
  • It also has softmax(logistic function) in it.
  1. Instantiate Optimizer
  • SGD Optimizer
  1. Traning the Model

  2. Prediction

有了hidden layers 模型的准确率达到了95%。比原来高了很多。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
# Create ANN Model
class ANNModel(nn.Module):
def __init__(self, input_dim, hidden_dim, output_dim):
super(ANNModel, self).__init__()
# Linear function 1: 784 --> 150
self.fc1 = nn.Linear(input_dim, hidden_dim)
# Non-linearity 1
self.relu1 = nn.ReLU()
# Linear function 2: 150 --> 150
self.fc2 = nn.Linear(hidden_dim, hidden_dim)
# Non-linearity 2
self.tanh2 = nn.Tanh()
# Linear function 3: 150 --> 150
self.fc3 = nn.Linear(hidden_dim, hidden_dim)
# Non-linearity 3
self.elu3 = nn.ELU()
# Linear function 4 (readout): 150 --> 10
self.fc4 = nn.Linear(hidden_dim, output_dim)
def forward(self, x):
# Linear function 1
out = self.fc1(x)
# Non-linearity 1
out = self.relu1(out)
# Linear function 2
out = self.fc2(out)
# Non-linearity 2
out = self.tanh2(out)
# Linear function 2
out = self.fc3(out)
# Non-linearity 2
out = self.elu3(out)
# Linear function 4 (readout)
out = self.fc4(out)
return out
# instantiate ANN
input_dim = 28*28
hidden_dim = 150 #hidden layer dim is one of the hyper parameter and it should be chosen and tuned. For now I only say 150 there is no reason.
output_dim = 10
# Create ANN
model = ANNModel(input_dim, hidden_dim, output_dim)
# Cross Entropy Loss
error = nn.CrossEntropyLoss()
# SGD Optimizer
learning_rate = 0.02
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
# ANN model training
count = 0
loss_list = []
iteration_list = []
accuracy_list = []
for epoch in range(num_epochs):
for i, (images, labels) in enumerate(train_loader):
train = Variable(images.view(-1, 28*28))
labels = Variable(labels)
# Clear gradients
optimizer.zero_grad()
# Forward propagation
outputs = model(train)
# Calculate softmax and ross entropy loss
loss = error(outputs, labels)
# Calculating gradients
loss.backward()
# Update parameters
optimizer.step()
count += 1
if count % 50 == 0:
# Calculate Accuracy
correct = 0
total = 0
# Predict test dataset
for images, labels in test_loader:
test = Variable(images.view(-1, 28*28))
# Forward propagation
outputs = model(test)
# Get predictions from the maximum value
predicted = torch.max(outputs.data, 1)[1]
# Total number of labels
total += len(labels)
# Total correct predictions
correct += (predicted == labels).sum()
accuracy = 100 * correct / float(total)
# store loss and iteration
loss_list.append(loss.data)
iteration_list.append(count)
accuracy_list.append(accuracy)
if count % 500 == 0:
# Print Loss
print('Iteration: {} Loss: {} Accuracy: {} %'.format(count, loss.data, accuracy))

Convolutional Neural Network (CNN)

CNN非常适合图像分类

CNN步骤:

  1. Import Libraries
  2. Prepare Dataset
  • 和前面部分完全相同
  • We use same dataset so we only need train_loader and test_loader.
  1. Convolutional layer:
  • 根据filters(kernels)创建feature
  • Padding: 在实施filter后,原始图片的维度减少。然而,我们要保留尽可能多的原始图片信息,我们可以在convolutional layer 后,运用padding去增加dimension of feature map。
  • 我们用2层convolutional的
  • out_channels是16
  • Filter(kernel) size is 5*5
  1. Pooling layer:
  • 为convolutional layer(feature map) output准备一个精简的feature map
  • 使用2个pooling layer
  • Pooling size是: 2*2
  1. Flattening: Flats the features map

  2. Fully Connected Layer:

  • 可以是logistic regression,但要以softmax function结尾。
  • 我们不会在FC层使用激活函数
  • 我们结合convolutional part 和 logistic regression去创建CNN模型
  1. Instantiate Model Class
  • create model
  1. Instantiate Loss
  • Cross entropy loss
  • It also has softmax(logistic function) in it.
  1. Instantiate Optimizer
  • SGD Optimizer
  1. Traning the Model

  2. Prediction
    convolutional layer准确率高达98%

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
# Create CNN Model
class CNNModel(nn.Module):
def __init__(self):
super(CNNModel, self).__init__()
# Convolution 1
self.cnn1 = nn.Conv2d(in_channels=1, out_channels=16, kernel_size=5, stride=1, padding=0)
self.relu1 = nn.ReLU()
# Max pool 1
self.maxpool1 = nn.MaxPool2d(kernel_size=2)
# Convolution 2
self.cnn2 = nn.Conv2d(in_channels=16, out_channels=32, kernel_size=5, stride=1, padding=0)
self.relu2 = nn.ReLU()
# Max pool 2
self.maxpool2 = nn.MaxPool2d(kernel_size=2)
# Fully connected 1
self.fc1 = nn.Linear(32 * 4 * 4, 10)
def forward(self, x):
# Convolution 1
out = self.cnn1(x)
out = self.relu1(out)
# Max pool 1
out = self.maxpool1(out)
# Convolution 2
out = self.cnn2(out)
out = self.relu2(out)
# Max pool 2
out = self.maxpool2(out)
# flatten
out = out.view(out.size(0), -1)
# Linear function (readout)
out = self.fc1(out)
return out
# batch_size, epoch and iteration
batch_size = 100
n_iters = 2500
num_epochs = n_iters / (len(features_train) / batch_size)
num_epochs = int(num_epochs)
# Pytorch train and test sets
train = torch.utils.data.TensorDataset(featuresTrain,targetsTrain)
test = torch.utils.data.TensorDataset(featuresTest,targetsTest)
# data loader
train_loader = torch.utils.data.DataLoader(train, batch_size = batch_size, shuffle = False)
test_loader = torch.utils.data.DataLoader(test, batch_size = batch_size, shuffle = False)
# Create CNN
model = CNNModel()
# Cross Entropy Loss
error = nn.CrossEntropyLoss()
# SGD Optimizer
learning_rate = 0.1
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

Reference

https://www.kaggle.com/kanncaa1/pytorch-tutorial-for-deep-learning-lovers