大家好，又见面了，我是你们的朋友全栈君。

 目录(?)[+]

Deep learning在计算机视觉方面具有广泛的应用，包括图像分类、目标识别、语义分隔、生成图像描述等各个方面。本系列博客将分享自己在这些方面的学习和认识，如有问题，欢迎交流。

在使用卷积神经网络进行分类任务时，往往使用以下几类损失函数：

平方误差损失
SVM损失
softmax损失

其中，平方误差损失在分类问题中效果不佳，一般用于回归问题。softmax损失函数和SVM(多分类)损失函数在实际应用中非常广泛。本文将对这两种损失函数做简单介绍，包括损失函数的计算、梯度的求解以及Python中使用Numpy库函数进行实现。

SVM多分类

1. 损失函数

一般而言，深度学习中使用的SVM损失函数是基于 Weston and Watkins 1999 (pdf) 。
其损失函数如下：

L i = \sum j \neq y i m a x (0, f j - f y i + Δ)

在实际使用中， Δ 的值一般取1，代表间隔。

在神经网络中，由于我们的评分函数是:

f = W * x

因此，可以将损失函数改写如下:

L i = \sum j \neq y i m a x (0, W T j x i - W T y i x i + Δ)

如果考虑整个训练集合上的平均损失，包括正则项，则公式如下：

L = 1 N \sum i \sum j \neq y i [m a x (0, f (x i; W) j - f (x i; W) y i + Δ)] + λ \sum k \sum l W 2 k, l

直观理解:
多类SVM“想要”正确类别的分类分数比其他不正确分类类别的分数要高，而且至少高出delta的边界值。如果其他分类分数进入了红色的区域，甚至更高，那么就开始计算损失。如果没有这些情况，损失值为0。我们的目标是找到一些权重，它们既能够让训练集中的数据样例满足这些限制，也能让总的损失值尽可能地低。

举一个具体的例子：

例子来源于斯坦福CS231n 课件。第一张图片是猫，神经网络计算得出其三个类别的分值分别为 3.2, 5.1 和 -1.7。很明显，理想情况下猫的分值应该高与其他两种类别，但根据计算结果，car的分值最高，因此在当前的权值设置下，该 network 会把这张图片分类为 car。此时我们可以根据公式计算损失

损失计算如下：(S代表Score，即分值)

L i = m a x (0, S c a r - S c a t + Δ) + m a x (0, S f r o g - S c a t + Δ) = 2.9 + 0

这里写图片描述

2. 梯度公式推导

设置以下变量：
– 矩阵 W 代表权值，维度是 D∗C ，其中 D 代表特征的维度， C 代表类别数目。
– 矩阵 X 代表样本集合，维度是 N∗D ，其中 N 代表样本个数。
– 分值计算公式为 f=X∗W ，其维度为 N∗C , 每行代表一个样本的不同类别的分值。

对于第 i 个样本的损失函数计算如下:

L i = \sum j \neq y i m a x (0, W T :, j x i, : - W T :, y i x i, : + Δ)

偏导数计算如下:

\partial L i \partial W : , y i = - (\sum j \neq y i 1 (w T :, j x i, : - w T :, y i x i, : + Δ > 0)) x i, :

\partial L i \partial W : , j = 1 (w T :, j x i, : - w T :, y i x i, : + Δ > 0) x i, :

其中：
– w:,j 代表W矩阵第 j 列，其维度为 D 。
– xi,: 代表X矩阵的第 i 行，表示样本 i 的特征，其维度也为 D 。
二者相乘，得出的是样本 i 在第 j 个类别上的得分。
– 1 代表示性函数。

3. python实现

包括向量化版本和非向量化版本：


def svm_loss_naive(W, X, y, reg):
    """ # SVM 损失函数 native版本 Inputs have dimension D, there are C classes, and we operate on minibatches of N examples. Inputs: - W: A numpy array of shape (D, C) containing weights. - X: A numpy array of shape (N, D) containing a minibatch of data. - y: A numpy array of shape (N,) containing training labels; y[i] = c means that X[i] has label c, where 0 <= c < C. - reg: (float) regularization strength Returns a tuple of: - loss as single float - gradient with respect to weights W; an array of same shape as W """
    dW = np.zeros(W.shape)    # initialize the gradient as zero

    # compute the loss and the gradient
    num_classes = W.shape[1]
    num_train = X.shape[0]
    loss = 0.0
    # 对于每一个样本，累加loss
    for i in xrange(num_train):
        scores = X[i].dot(W)     # (1, C)
        correct_class_score = scores[y[i]]
        for j in xrange(num_classes):
            if j == y[i]:
                continue
            # 根据 SVM 损失函数计算
            margin = scores[j] - correct_class_score + 1    # note delta = 1
            # 当 margin>0 时，才会有损失，此时也会有梯度的累加
            if margin > 0:      # max(0, yi - yc + 1)
                loss += margin
                 # 根据公式：∇Wyi Li = - xiT(∑j≠yi1(xiWj - xiWyi +1>0)) + 2λWyi
                dW[:, y[i]] += -X[i, :]   # y[i] 是正确的类
                # 根据公式： ∇Wj Li = xiT 1(xiWj - xiWyi +1>0) + 2λWj ,
                dW[:, j] += X[i, :]

    # 训练数据平均损失
    loss /= num_train
    dW /= num_train

    # 正则损失
    loss += 0.5 * reg * np.sum(W * W)
    dW += reg * W

    #
    return loss, dW


#
def svm_loss_vectorized(W, X, y, reg):
    """ SVM 损失函数 向量化版本 Structured SVM loss function, vectorized implementation.Inputs and outputs are the same as svm_loss_naive. """
    loss = 0.0
    dW = np.zeros(W.shape)   # initialize the gradient as zero
    scores = X.dot(W)        # N by C 样本数*类别数
    num_train = X.shape[0]
    num_classes = W.shape[1]

    scores_correct = scores[np.arange(num_train), y]
    scores_correct = np.reshape(scores_correct, (num_train, 1))  # N*1 每个样本的正确类别

    margins = scores - scores_correct + 1.0     # N by C 计算scores矩阵中每一处的损失
    margins[np.arange(num_train), y] = 0.0      # 每个样本的正确类别损失置0
    margins[margins <= 0] = 0.0                 # max(0, x)
    loss += np.sum(margins) / num_train         # 累加所有损失，取平均
    loss += 0.5 * reg * np.sum(W * W)           # 正则

    # compute the gradient
    margins[margins > 0] = 1.0                  # max(0, x) 大于0的梯度计为1
    row_sum = np.sum(margins, axis=1)           # N*1 每个样本累加
    margins[np.arange(num_train), y] = -row_sum  # 类正确的位置 = -梯度累加
    dW += np.dot(X.T, margins)/num_train + reg * W     # D by C
    return loss, dW
   
   
   
    
    1
    
    2
    
    3
    
    4
    
    5
    
    6
    
    7
    
    8
    
    9
    
    10
    
    11
    
    12
    
    13
    
    14
    
    15
    
    16
    
    17
    
    18
    
    19
    
    20
    
    21
    
    22
    
    23
    
    24
    
    25
    
    26
    
    27
    
    28
    
    29
    
    30
    
    31
    
    32
    
    33
    
    34
    
    35
    
    36
    
    37
    
    38
    
    39
    
    40
    
    41
    
    42
    
    43
    
    44
    
    45
    
    46
    
    47
    
    48
    
    49
    
    50
    
    51
    
    52
    
    53
    
    54
    
    55
    
    56
    
    57
    
    58
    
    59
    
    60
    
    61
    
    62
    
    63
    
    64
    
    65
    
    66
    
    67
    
    68
    
    69
    
    70
    
    71
    
    72
    
    73
    
    74
    
    75
    
    76
    
    77
    
    78
    
    79
    
    80
    
    81
    
    82
   
   
   
   
   
    
    1
    
    2
    
    3
    
    4
    
    5
    
    6
    
    7
    
    8
    
    9
    
    10
    
    11
    
    12
    
    13
    
    14
    
    15
    
    16
    
    17
    
    18
    
    19
    
    20
    
    21
    
    22
    
    23
    
    24
    
    25
    
    26
    
    27
    
    28
    
    29
    
    30
    
    31
    
    32
    
    33
    
    34
    
    35
    
    36
    
    37
    
    38
    
    39
    
    40
    
    41
    
    42
    
    43
    
    44
    
    45
    
    46
    
    47
    
    48
    
    49
    
    50
    
    51
    
    52
    
    53
    
    54
    
    55
    
    56
    
    57
    
    58
    
    59
    
    60
    
    61
    
    62
    
    63
    
    64
    
    65
    
    66
    
    67
    
    68
    
    69
    
    70
    
    71
    
    72
    
    73
    
    74
    
    75
    
    76
    
    77
    
    78
    
    79
    
    80
    
    81
    
    82

Softmax 损失函数

1. 损失函数

Softmax 函数是 Logistic 函数的推广，用于多分类。

分值的计算公式不变：

f (x i; W) = W * x

损失函数使用交叉熵损失函数，第 i 个样本的损失如下：

L i = - l o g (e f y i \sum j e f j)

其中正确类别得分的概率可以被表示成：

P (y i | x i; W) = e f y i \sum j e f j

在实际使用中， efj 常常因为指数太大而出现数值爆炸问题，两个非常大的数相除会出现数值不稳定问题，因此我们需要在分子和分母中同时进行以下处理：

$e f y i \sum j e f j = C e f y i C \sum j e f j = e f y i + l o g C \sum j e f j + l o g C$

其中

C 的设置是任意的，在实际变成中，往往把

C 设置成：

$l o g C = - m a x f j$

即第

i 个样本所有分值中最大的值，当现有分值减去该最大分值后结果

≤0 ，放在

e 的指数上可以保证分子分布都在
0-1之内。

2. 梯度推导

梯度的推导如下：
这里写图片描述

3. Python实现

def softmax_loss_naive(W, X, y, reg):
    """ Softmax loss function, naive implementation (with loops) Inputs have dimension D, there are C classes, and we operate on minibatches of N examples. Inputs: - W: A numpy array of shape (D, C) containing weights. - X: A numpy array of shape (N, D) containing a minibatch of data. - y: A numpy array of shape (N,) containing training labels; y[i] = c means that X[i] has label c, where 0 <= c < C. - reg: (float) regularization strength Returns a tuple of: - loss as single float - gradient with respect to weights W; an array of same shape as W """
    # Initialize the loss and gradient to zero.

    loss = 0.0
    dW = np.zeros_like(W)
    dW_each = np.zeros_like(W)
    #
    num_train, dim = X.shape
    num_class = W.shape[1]
    f = X.dot(W)        # 样本数*类别数 分值
    #
    f_max = np.reshape(np.max(f, axis=1), (num_train, 1))
    # 计算对数概率 prob.shape=N*10 每一行与一个样本相对应 每一行的概率和为1
    # 其中 f_max 是每行的最大值，exp(x)中x的值过大而出现数值不稳定问题
    prob = np.exp(f - f_max) / np.sum(np.exp(f - f_max), axis=1, keepdims=True)
    #
    y_trueClass = np.zeros_like(prob)
    y_trueClass[np.arange(num_train), y] = 1.0     # 每行只有正确的类别处为1，其余为0
    #
    for i in range(num_train):
        for j in range(num_class):
            loss += -(y_trueClass[i, j] * np.log(prob[i, j]))
            dW_each[:, j] = -(y_trueClass[i, j] - prob[i, j]) * X[i, :]
        dW += dW_each
    loss /= num_train
    loss += 0.5 * reg * np.sum(W * W)
    dW /= num_train
    dW += reg * W

    return loss, dW


def softmax_loss_vectorized(W, X, y, reg):
    """ Softmax loss function, vectorized version. Inputs and outputs are the same as softmax_loss_naive. """
    loss = 0.0
    dW = np.zeros_like(W)    # D by C
    num_train, dim = X.shape

    f = X.dot(W)    # N by C
    # Considering the Numeric Stability
    f_max = np.reshape(np.max(f, axis=1), (num_train, 1))   # N by 1
    prob = np.exp(f - f_max) / np.sum(np.exp(f - f_max), axis=1, keepdims=True)
    y_trueClass = np.zeros_like(prob)
    y_trueClass[range(num_train), y] = 1.0    # N by C

    # 计算损失 y_trueClass是N*C维度 np.log(prob)也是N*C的维度
    loss += -np.sum(y_trueClass * np.log(prob)) / num_train + 0.5 * reg * np.sum(W * W)

    # 计算损失 X.T = (D*N) y_truclass-prob = (N*C)
    dW += -np.dot(X.T, y_trueClass - prob) / num_train + reg * W

    return loss, dW
   
   
   
    
    1
    
    2
    
    3
    
    4
    
    5
    
    6
    
    7
    
    8
    
    9
    
    10
    
    11
    
    12
    
    13
    
    14
    
    15
    
    16
    
    17
    
    18
    
    19
    
    20
    
    21
    
    22
    
    23
    
    24
    
    25
    
    26
    
    27
    
    28
    
    29
    
    30
    
    31
    
    32
    
    33
    
    34
    
    35
    
    36
    
    37
    
    38
    
    39
    
    40
    
    41
    
    42
    
    43
    
    44
    
    45
    
    46
    
    47
    
    48
    
    49
    
    50
    
    51
    
    52
    
    53
    
    54
    
    55
    
    56
    
    57
    
    58
    
    59
    
    60
    
    61
    
    62
    
    63
    
    64
    
    65
    
    66
    
    67
    
    68
    
    69
    
    70
    
    71
    
    72
   
   
   
   
   
    
    1
    
    2
    
    3
    
    4
    
    5
    
    6
    
    7
    
    8
    
    9
    
    10
    
    11
    
    12
    
    13
    
    14
    
    15
    
    16
    
    17
    
    18
    
    19
    
    20
    
    21
    
    22
    
    23
    
    24
    
    25
    
    26
    
    27
    
    28
    
    29
    
    30
    
    31
    
    32
    
    33
    
    34
    
    35
    
    36
    
    37
    
    38
    
    39
    
    40
    
    41
    
    42
    
    43
    
    44
    
    45
    
    46
    
    47
    
    48
    
    49
    
    50
    
    51
    
    52
    
    53
    
    54
    
    55
    
    56
    
    57
    
    58
    
    59
    
    60
    
    61
    
    62
    
    63
    
    64
    
    65
    
    66
    
    67
    
    68
    
    69
    
    70
    
    71
    
    72

Softmax、SVM损失函数用于CIFAR-10图像分类

CIFAR-10 小图分类是对于练习而言非常方便的一个数据集。通过在该数据集上实现基本的 softmax 损失函数和 SVM 损失函数以及可视化部分结果，可以加深对算法的理解。

关于本文的全部代码可以到GitHub中下载

下面给出代码运行过程中的输出结果：

1. 可视化CIFAR-10的部分样本

这里写图片描述

原始像素作为特征使用SVM分类的损失图

这里写图片描述

两层神经网络使用softmax分类的损失和准确率图

这里写图片描述

两层神经网络使用softmax分类的第一个隐含层权重图：

这里写图片描述

参考资料

[1] http://www.jianshu.com/p/004c99623104
[2] http://deeplearning.stanford.edu/wiki/index.php/Softmax%E5%9B%9E%E5%BD%92
[3] http://blog.csdn.net/acdreamers/article/details/44663305
[4] http://cs231n.github.io/

结束

发布者：全栈程序员-用户IM，转载请注明出处：https://javaforall.cn/153135.html原文链接：https://javaforall.cn

【正版授权，激活自己账号】： Jetbrains全家桶Ide使用，1年售后保障，每天仅需1毛

【官方授权正版激活】： 官方授权正版激活支持Jetbrains家族下所有IDE 使用个人JB账号...

SVM, Softmax损失函数[通俗易懂]

SVM多分类

1. 损失函数

2. 梯度公式推导

3. python实现

Softmax 损失函数

1. 损失函数

2. 梯度推导

3. Python实现

Softmax、SVM损失函数用于CIFAR-10图像分类

1. 可视化CIFAR-10的部分样本

原始像素作为特征使用SVM分类的损失图

两层神经网络使用softmax分类的损失和准确率图

两层神经网络使用softmax分类的第一个隐含层权重图：

参考资料

结束

发表回复

SVM, Softmax损失函数[通俗易懂]

SVM多分类

1. 损失函数

2. 梯度公式推导

3. python实现

Softmax 损失函数

1. 损失函数

2. 梯度推导

3. Python实现

Softmax、SVM损失函数用于CIFAR-10图像分类

1. 可视化CIFAR-10的部分样本

原始像素作为特征使用SVM分类的损失图

两层神经网络使用softmax分类的损失和准确率图

两层神经网络使用softmax分类的第一个隐含层权重图：

参考资料

结束

相关推荐

全面理解.htaccess语法中RewriteCond和RewriteRule意义

fec什么意思_佳能r5传感器

python 面向对象（进阶篇）

information leakage._information interview

sklearn安装教程_cmd安装软件命令

转让malloc()该功能后，发生了什么事内核？附malloc()和free()实现源

发表回复