根据Tensorflow中段的长度计算tf.math.segment_sum中所需的段ID

Question

我正在使用可变大小的顺序数据。让我们考虑像

Y = [ [.01,.02], [.03,.04], [.05,.06], [.07,.08], [.09,.1] ]
l = [ 3, 2 ]

其中Y是对我的数据进行一些辅助计算的结果，l存储原始序列的长度。在该示例中，[.01,.02], [.03,.04], [.05,.06]因此是对批次的第一序列执行的计算的结果，并且[.07,.08], [.09,.1]分别是对该批长度3and和2的第二序列执行的计算的结果。现在我想对Y的条目进行一些进一步的计算，但是按序列分组。在Tensorflow中，有tf.math.segment_sum这样的功能，可以按组进行。

让我们说我想用tf.math.segment_sum求和。我会感兴趣的

seq_ids = [ 0, 0, 0, 1, 1 ]
tf.math.segment_sum(Y, segment_ids=seq_ids) #returns [ [0.09 0.12], [0.16 0.18] ]

我现在面临的问题是从seq_ids获得l。在numpy中，很容易找到它

seq_ids = np.digitize( np.arange(np.sum(l)), np.cumsum(l) )

似乎有一个隐藏的（来自python api）相当于digitize命名为bucketize，如this search中提到的Tensorflow中的digitize。但似乎所提到的hidden_ops.txt已从Tensorflow中删除，我不清楚是否仍然（并将会）支持python api中的tensorflow::ops::Bucketize函数。我得到类似结果的另一个想法是使用tf.train.piecewise_constant函数。但这次尝试失败了

seq_ids = tf.train.piecewise_constant(tf.range(tf.math.reduce_sum(l)), tf.math.cumsum(l), tf.range(BATCH_SIZE-1))

与object of type 'Tensor' has no len()失败了。似乎tf.train.piecewise_constant没有以最一般的方式实现，因为参数boundaries和values需要是列表而不是张量。在我的情况下，l是一个一维张量聚集在我的tf.data.Dataset的一个小批量

Answer 1

这是一种方法：

import tensorflow as tf

def make_seq_ids(lens):
    # Get accumulated sums (e.g. [2, 3, 1] -> [2, 5, 6])
    c = tf.cumsum(lens)
    # Take all but the last accumulated sum value as indices
    idx = c[:-1]
    # Put ones on every index
    s = tf.scatter_nd(tf.expand_dims(idx, 1), tf.ones_like(idx), [c[-1]])
    # Use accumulated sums to generate ids for every segment
    return tf.cumsum(s)

with tf.Graph().as_default(), tf.Session() as sess:
    print(sess.run(make_seq_ids([2, 3, 1])))
    # [0 0 1 1 1 2]

编辑：

您也可以使用tf.searchsorted实现相同的功能，与您为NumPy提议的方式更相似：

import tensorflow as tf

def make_seq_ids(lens):
    c = tf.cumsum(lens)
    return tf.searchsorted(c, tf.range(c[-1]), side='right')

这些实现都不应成为TensorFlow模型的瓶颈，因此对于任何实际目的而言，选择哪一个都无关紧要。然而，有趣的是，在我的特定机器（Win 10，TF 1.12，Core i7 7700K，Titan V）中，第二种实现在CPU上运行时慢约1.5倍，在GPU上运行时快约3.5倍。

根据Tensorflow中段的长度计算tf.math.segment_sum中所需的段ID

问题描述投票：2回答：1

1个回答

最新问题

根据Tensorflow中段的长度计算tf.math.segment_sum中所需的段ID

问题描述 投票：2回答：1

1个回答

最新问题

问题描述投票：2回答：1