Tensorflow：如何设置对数刻度和一些Tensorflow问题的学习率

Question

我是一个深度学习和Tensorflow初学者，我正在尝试使用Tensorflow在这个paper中实现该算法。本文使用Matconvnet + Matlab来实现它，我很好奇Tensorflow是否具有实现相同功能的等效功能。该报说：

使用Xavier方法[14]初始化网络参数。我们使用了在l2惩罚下的四个小波子带的回归损失，并且使用随机梯度下降（SGD）训练所提出的网络。正则化参数（λ）为0.0001，动量为0.9。学习率设定为10-1至10-4，在每个时期以对数标度减少。

本文使用小波变换（WT）和残差学习方法（其中残差图像= WT（HR）-WT（HR'），并且HR'用于训练）。 Xavier方法建议用变量初始化变量正态分布

stddev=sqrt(2/(filter_size*filter_size*num_filters)

Q1。我该如何初始化变量？以下代码是否正确？

weights = tf.Variable(tf.random_normal[img_size, img_size, 1, num_filters], stddev=stddev)

本文没有详细说明如何构造损失函数。我无法找到等效的Tensorflow函数来设置对数刻度的学习率（仅限exponential_decay）。我理解MomentumOptimizer相当于具有动量的随机梯度下降。

Q2：是否可以设置对数刻度的学习率？

Q3：如何创建上述损失函数？

我按照这个website编写下面的代码。假设model（）函数返回本文中提到的网络并且lamda = 0.0001，

inputs = tf.placeholder(tf.float32, shape=[None, patch_size, patch_size, num_channels])
labels = tf.placeholder(tf.float32, [None, patch_size, patch_size, num_channels])

# get the model output and weights for each conv
pred, weights = model()

# define loss function
loss = tf.nn.softmax_cross_entropy_with_logits_v2(labels=labels, logits=pred)

for weight in weights:
    regularizers += tf.nn.l2_loss(weight)

loss = tf.reduce_mean(loss + 0.0001 * regularizers)

learning_rate = tf.train.exponential_decay(???) # Not sure if we can have custom learning rate for log scale
optimizer = tf.train.MomentumOptimizer(learning_rate, momentum).minimize(loss, global_step)

注意：因为我是一个深度学习/ Tensorflow初学者，我在这里和那里复制粘贴代码所以如果可以，请随时纠正它;）

Answer 1

其他答案非常详细和有用。下面是一个代码示例，它使用占位符来衰减日志规模的学习率。 HTH。

import tensorflow as tf

import numpy as np


# data simulation
N = 10000
D = 10
x = np.random.rand(N, D)
w = np.random.rand(D,1)
y = np.dot(x, w)

print y.shape

#modeling
batch_size = 100
tni = tf.truncated_normal_initializer()
X = tf.placeholder(tf.float32, [batch_size, D])
Y = tf.placeholder(tf.float32, [batch_size,1])
W = tf.get_variable("w", shape=[D,1], initializer=tni)
B = tf.zeros([1])

lr = tf.placeholder(tf.float32)

pred = tf.add(tf.matmul(X,W), B)
print pred.shape
mse = tf.reduce_sum(tf.losses.mean_squared_error(Y, pred))
opt = tf.train.MomentumOptimizer(lr, 0.9)

train_op = opt.minimize(mse)

learning_rate = 0.0001

do_train = True
acc_err = 0.0
sess = tf.Session()
sess.run(tf.global_variables_initializer())
while do_train:
  for i in range (100000):
     if i > 0 and i % N == 0:
       # epoch done, decrease learning rate by 2
       learning_rate /= 2
       print "Epoch completed. LR =", learning_rate

     idx = i/batch_size + i%batch_size
     f = {X:x[idx:idx+batch_size,:], Y:y[idx:idx+batch_size,:], lr: learning_rate}
     _, err = sess.run([train_op, mse], feed_dict = f)
     acc_err += err
     if i%5000 == 0:
       print "Average error = {}".format(acc_err/5000)
       acc_err = 0.0

Answer 2

Q1. How should I initialize the variables? Is the code below correct?

使用tf.get_variable或切换到slim（它会自动为您执行初始化）。 example

Q2: Is it possible to set the learning rate in log scale?

你可以，但你需要吗？这不是您需要在此网络中解决的第一件事。请检查＃3

但是，仅供参考，请使用以下符号。

learning_rate_node = tf.train.exponential_decay（learning_rate = 0.001，decay_steps = 10000，decay_rate = 0.98，staircase = True）

optimizer = tf.train.AdamOptimizer（learning_rate = learning_rate_node）.minimize（loss）

Q3: How to create the loss function described above?

首先，您没有将“pred”写入“图像”转换为此消息（根据您需要应用减法和IDWT以获取最终图像的纸张）。

这里有一个问题，必须根据您的标签数据计算日志。即如果您将标记数据用作“Y：标签”，则需要编写

pred = model（）

pred = tf.matmul（pred，weights）+偏见

logits = tf.nn.softmax（pred）

loss = tf.reduce_mean（tf.abs（logits - labels））

这将为您提供要使用的Y：Label的输出

如果您的数据集标记的图像是去噪的图像，在这种情况下您需要遵循以下步骤：

pred = model（）

pred = tf.matmul（图像，权重）+偏见

logits = tf.nn.softmax（pred）

image = apply_IDWT（“X：input”，logits）＃这将应用IDWT（x_label - y_label）

loss = tf.reduce_mean（tf.abs（image - labels））

Logits是网络的输出。您将使用此结果来计算其余部分。您可以在此处添加conv2d层而不使用批量规范化和激活功能，并将输出要素计数设置为4，而不是matmul。示例：

pred = model（）

pred = slim.conv2d（pred，4，[3,3]，activation_fn = None，padding ='SAME'，scope ='output'）

logits = tf.nn.softmax（pred）

image = apply_IDWT（“X：input”，logits）＃这将应用IDWT（x_label - y_label）

loss = tf.reduce_mean（tf.abs（logits - labels））

此损失功能将为您提供基本的培训功能。然而，这是L1距离，它可能会遇到一些问题（check）。想想以下情况

假设你有以下数组作为输出[10,10,10,0,0]，你尝试实现[10,10,10,10,10]。在这种情况下，您的损失是20（10 + 10）。但是，你有3/5的成功。此外，它可能表明一些过度适应。

对于同样的情况，请考虑以下输出[6,6,6,6,6]。它仍然有20（4 + 4 + 4 + 4 + 4）的损失。但是，无论何时应用阈值5，您都可以获得5/5的成功。因此，这就是我们想要的情况。

如果你使用L2损失，对于第一种情况，你将有10 ^ 2 + 10 ^ 2 = 200作为损失输出。对于第二种情况，您将获得4 ^ 2 * 5 = 80.因此，优化器将尽可能快地逃避＃1以实现全球成功，而不是某些输出的完美成功以及其他输出的完全失败。您可以为此应用此类损失函数。

tf.reduce_mean（tf.nn.l2_loss（logits - image））

或者，您可以检查交叉熵损失函数。（它在内部应用softmax，不要两次应用softmax）

tf.reduce_mean（tf.nn.softmax_cross_entropy_with_logits（pred，image））

Answer 3

Q1. How should I initialize the variables? Is the code below correct?

这是正确的（尽管缺少一个开括号）。如果变量将被重用，你也可以查看tf.get_variable。

Q2: Is it possible to set the learning rate in log scale?

指数衰减会降低每一步的学习率。我想你想要的是tf.train.piecewise_constant，并在每个时代设置边界。

编辑：看看另一个答案，使用staircase=True参数！

Q3: How to create the loss function described above?

您的损失函数看起来正确。

Tensorflow：如何设置对数刻度和一些Tensorflow问题的学习率

问题描述投票：7回答：3

3个回答

Q1. How should I initialize the variables? Is the code below correct?

Q2: Is it possible to set the learning rate in log scale?

Q3: How to create the loss function described above?

Q1. How should I initialize the variables? Is the code below correct?

Q2: Is it possible to set the learning rate in log scale?

Q3: How to create the loss function described above?

最新问题

Tensorflow：如何设置对数刻度和一些Tensorflow问题的学习率

问题描述 投票：7回答：3

3个回答

Q1. How should I initialize the variables? Is the code below correct?

Q2: Is it possible to set the learning rate in log scale?

Q3: How to create the loss function described above?

Q1. How should I initialize the variables? Is the code below correct?

Q2: Is it possible to set the learning rate in log scale?

Q3: How to create the loss function described above?

最新问题

问题描述投票：7回答：3