大家好,又见面了,我是你们的朋友全栈君。
1. 什么是学习率
调参的第一步是知道这个参数是什么,它的变化对模型有什么影响。
(1)要理解学习率是什么,首先得弄明白神经网络参数更新的机制-梯度下降+反向传播。参考资料:https://www.cnblogs.com/softzrp/p/6718909.html。
总结一句话:将输出误差反向传播给网络参数,以此来拟合样本的输出。本质上是最优化的一个过程,逐步趋向于最优解。但是每一次更新参数利用多少误差,就需要通过一个参数来控制,这个参数就是学习率(Learning rate),也称为步长。从bp算法的公式可以更好理解:
(2)学习率对模型的影响
从公式就可以看出,学习率越大,输出误差对参数的影响就越大,参数更新的就越快,但同时受到异常数据的影响也就越大,很容易发散。
2. 学习率指数衰减机制
在1. 中理解了学习率变化对模型的影响,我们可以看出,最理想的学习率不是固定值,而是一个随着训练次数衰减的变化的值,也就是在训练初期,学习率比较大,随着训练的进行,学习率不断减小,直到模型收敛。常用的衰减机制有:
在这三种方法中,最常用的是指数衰减,实践证明,它也是最有效的。
tensorflow中它的数学表达式为:
decayed_lr = lr0*(decay_rate^(global_steps/decay_steps)
参数解释:
decayed_lr:衰减后的学习率,也就是当前训练不使用的真实学习率
lr0: 初始学习率
decay_rate: 衰减率,每次衰减的比例
global_steps:当前训练步数
decay_steps:衰减步数,每隔多少步衰减一次。
tensorflow对应API:
global_step = tf.Variable(0)
lr = tf.train.exponential_decay(
lr0,
global_step,
decay_steps=lr_step,
decay_rate=lr_decay,
staircase=True)
staircase=True 参数是说 global_steps/decay_steps 取整更新,也就是能做到每隔decay_steps学习率更新一次。
3. 实例解析
# -*- coding: utf-8 -*-
# @Time : 18-10-5 下午3:38
# @Author : gxrao
# @Site :
# @File : cnn_mnist_2.py
# @Software: PyCharm
import os
# os.environ["CUDA_VISIBLE_DEVICES"]="-1"
import tensorflow as tf
import matplotlib.pylab as plt
from functools import reduce
import time
# prepare the data
import tensorflow.examples.tutorials.mnist.input_data as input_data
mnist = input_data.read_data_sets('./data/MNIST/', one_hot=True)
# create the data graph and set it as default
sess = tf.InteractiveSession()
# parameters setting
batch_size = 50
max_steps = 16000
lr0 = 0.0001
regularizer_rate = 0.0001
lr_decay = 0.99
lr_step = 500
sample_size = 40000
# init weights and bias
def init_variable(w_shape, b_shape, regularizer=None):
weights = tf.get_variable('weights',w_shape,initializer=tf.truncated_normal_initializer(stddev=0.1))
if regularizer != None:
tf.add_to_collection('lossess', regularizer(weights))
biases = tf.get_variable('biases', b_shape, initializer=tf.constant_initializer(0.0))
return weights, biases
# create conv2d
def conv2d(x,w,b,keep_prob):
# conv
conv_res = tf.nn.conv2d(x,w,strides=[1,1,1,1],padding='SAME') + b
# activation function
activation_res = tf.nn.relu(conv_res)
# pooling
pool_res = tf.nn.max_pool(activation_res, ksize=[1,2,2,1],strides=[1,2,2,1],padding='SAME')
# drop out
drop_res = tf.nn.dropout(pool_res,keep_prob)
return drop_res
def inference(x,reuse=False,regularizer=None,dropout=1.0):
# layer_1:
x_img = tf.reshape(x,shape=[-1,28,28,1])
with tf.variable_scope('cnn_layer1',reuse=reuse):
weights, biases = init_variable([5,5,1,32],[32])
cnn1_res = conv2d(x_img,weights,biases,1.0)
# layer_2
with tf.variable_scope('cnn_layer2',reuse=reuse):
weights, biases = init_variable([3,3,32,64],[64])
cnn2_res = conv2d(cnn1_res,weights, biases,1.0)
# layer_3
with tf.variable_scope('cnn_layer3',reuse=reuse):
weights, biases = init_variable([3, 3, 64, 128], [128])
cnn3_res = conv2d(cnn2_res, weights, biases, 1.0)
cnn3_shape = cnn3_res.shape.as_list()[1:]
h3_s = reduce(lambda x, y: x * y, cnn3_shape)
cnn3_reshape = tf.reshape(cnn3_res,[-1,h3_s])
# layer_4
with tf.variable_scope('fcn1',reuse=reuse):
weights, biases = init_variable([h3_s,5000],[5000],regularizer)
fcn1_res = tf.nn.relu(tf.matmul(cnn3_reshape,weights)+biases)
fcn1_dropout = tf.nn.dropout(fcn1_res,dropout)
# layer_5
with tf.variable_scope('fcn2',reuse=reuse):
weights, biases = init_variable([5000,500],[500], regularizer)
fcn2_res = tf.nn.relu(tf.matmul(fcn1_dropout,weights)+biases)
fcn2_dropout = tf.nn.dropout(fcn2_res,1.0)
# output layer
with tf.variable_scope('out_put_layer',reuse=reuse):
weights, biases = init_variable([500,10],10)
y = tf.nn.softmax(tf.matmul(fcn2_dropout,weights)+biases)
return y
train_acc = []
validation_acc = []
train_loss = []
# create train model
def train_model():
start = time.time()
# add th input placeholder
x = tf.placeholder(tf.float32, shape=[None, 784], name='x')
y_ = tf.placeholder(tf.float32, shape=[None, 10])
keep_prob = tf.placeholder(tf.float32)
global_step = tf.Variable(0)
lr = tf.train.exponential_decay(
lr0,
global_step,
decay_steps=lr_step,
decay_rate=lr_decay,
staircase=True)
# select regularizer
regularizer = tf.contrib.layers.l2_regularizer(regularizer_rate)
# define loss and select optimizer
y = inference(x, reuse=False,regularizer=None, dropout= keep_prob)
cross_entropy = -tf.reduce_sum(y_ * tf.log(y))
loss = cross_entropy
train_step = tf.train.AdamOptimizer(learning_rate=lr).minimize(
loss,global_step=global_step)
predict = tf.equal(tf.argmax(y_, 1), tf.argmax(y, 1))
acc = tf.reduce_mean(tf.cast(predict, 'float'))
# train the model
# init all variables
init = tf.global_variables_initializer()
init.run()
for i in range(max_steps):
x_batch, y_batch = mnist.train.next_batch(batch_size)
_, _loss = sess.run(
[train_step, loss], feed_dict={
x: x_batch,
y_: y_batch,
keep_prob: 0.8
})
train_loss.append(_loss)
if (i + 1) % 200 == 0:
print('training steps: %d' % (i + 1))
validation_x, validation_y = mnist.validation.next_batch(500)
_validation_acc = acc.eval(feed_dict={
x: validation_x,
y_: validation_y,
keep_prob: 1.0
})
validation_acc.append(_validation_acc)
print('validation accurary is %f' % _validation_acc)
x_batch, y_batch = mnist.train.next_batch(500)
_train_acc = acc.eval(feed_dict={x: x_batch, y_: y_batch,keep_prob: 1.0})
train_acc.append(_train_acc)
print('train accurary is %f' % _train_acc)
stop = time.time()
total= (stop - start)
print('train time: %d'%int(total))
def test_model():
x = tf.placeholder(tf.float32, shape=[None, 784], name='x')
y_ = tf.placeholder(tf.float32, shape=[None, 10])
keep_prob = tf.placeholder(tf.float32)
y = inference(x, reuse=tf.AUTO_REUSE, dropout=keep_prob)
predict = tf.equal(tf.argmax(y_, 1), tf.argmax(y, 1))
acc = tf.reduce_mean(tf.cast(predict, 'float'))
# because it raise error when i select all test data.I select 2000 images as testing data
acc_test = acc.eval(feed_dict={
x: mnist.test.images[:2000, :],
y_: mnist.test.labels[:2000, :],
keep_prob:1.0
})
print('test accurary is:%f' % acc_test)
def plot_acc():
# plot the train acc and validation acc
x_axis = [i for i in range(len(train_acc))]
plt.plot(x_axis, train_acc, 'r-', label='train accuracy')
plt.plot(x_axis, validation_acc, 'b:', label='validation accuracy')
plt.legend()
plt.xlabel('train steps')
plt.ylabel('accuracy')
plt.savefig('accuracy.jpg')
def plot_loss():
x_axis = [i for i in range(len(train_loss))]
plt.plot(x_axis, train_loss, 'r-')
plt.legend()
plt.xlabel('train steps')
plt.ylabel('train loss')
plt.savefig('train_loss.jpg')
def save_model():
# save the model
module_save_dir = './model/cnn_mnist_2/'
if not os.path.exists(module_save_dir):
os.makedirs(module_save_dir)
saver = tf.train.Saver()
saver.save(sess, module_save_dir + 'model.ckpt')
sess.close()
if __name__ == "__main__":
train_model()
test_model()
plot_acc()
plot_loss()
save_model()
这是一个简单的基于mnist的cnn分类程序。
# parameters setting
batch_size = 50
max_steps = 16000
lr0 = 0.0001
regularizer_rate = 0.0001
lr_decay = 0.99
lr_step = 500
定义了相关参数
global_step = tf.Variable(0)
lr = tf.train.exponential_decay(
lr0,
global_step,
decay_steps=lr_step,
decay_rate=lr_decay,
staircase=True)
在使用指数衰减学习率时,一定要记得global_step = tf.Variable(0)的定义,不然后面参数不会更新
train_step = tf.train.AdamOptimizer(learning_rate=lr).minimize(
loss,global_step=global_step)
训练步中指明学习率以及global_step,这两个千万不能忘记。
4. 总结
指数衰减学习率是深度学习调参过程中比较使用的一个方法,刚开始训练时,学习率以 0.01 ~ 0.001 为宜, 接近训练结束的时候,学习速率的衰减应该在100倍以上。按照这个经验去设置相关参数,对于模型的精度会有很大帮助。
发布者:全栈程序员-用户IM,转载请注明出处:https://javaforall.cn/131862.html原文链接:https://javaforall.cn
【正版授权,激活自己账号】: Jetbrains全家桶Ide使用,1年售后保障,每天仅需1毛
【官方授权 正版激活】: 官方授权 正版激活 支持Jetbrains家族下所有IDE 使用个人JB账号...