曲折行为将函数传递到张量流数据集映射方法

Question

今天早些时候，这对我来说很好，但是当我重新启动笔记本时，它突然开始非常奇怪。我有一个TF数据集，该数据集将Numpy文件及其相应的标签作为输入，例如So so。当我使用1项使用

tf.data.Dataset.from_tensor_slices((specgram_files, labels))

时，我会得到预期的输出，即张量的元组，其中第一个张量包含numpy文件的名称作为字节字符串，第二个张量包含编码的标签。然后，我有一个函数，可以使用

for item in ds.take(1): print(item)

读取文件并产生一个numpy数组，然后将其返回。此功能传递到Map（）方法中，看起来像这样：

np.load()

，read_npy_file看起来像这样：

ds = ds.map( lambda file, label: tuple([tf.numpy_function(read_npy_file, [file], [tf.float32]), label]), num_parallel_calls=tf.data.AUTOTUNE)

您可以看到，映射应创建另一个张量的元组，其中第一个张量是numpy阵列，而第二个张量为标签，未触及。这很早就起作用了，但是现在它给出了最奇怪的行为。我将打印语句放置在

def read_npy_file(data):
    # 'data' stores the file name of the numpy binary file storing the features of a particular sound file
    # as a bytes string.
    # decode() is  called on the bytes string to decode it from a bytes string to a regular string
    # so that it can passed as a parameter into np.load()
    data = np.load(data.decode())
    return data.astype(np.float32)

函数中，以查看是否传递了正确的数据。我希望它通过一个字符串，但是当我在

read_npy_file()

函数中调用

print(data)

并从数据集中拨打1个项目时，它会产生此输出，以使用

read_npy_file()

：

：

触发一个映射：

：

ds.take(1)

我没有修改输出的任何格式。我非常感谢任何帮助。与哈哈一起工作绝对是一场噩梦。 there是完整的代码

b'./challengeA_data/log_spectrogram/2603ebb3-3cd3-43cc-98ef-0c128c515863.npy'b'./challengeA_data/log_spectrogram/fab6a266-e97a-4935-a0c3-444fc4426fc5.npy'b'./challengeA_data/log_spectrogram/93014682-60a2-45bd-9c9e-7f3c97b83be9.npy'b'./challengeA_data/log_spectrogram/710f2430-5da3-4822-a252-6ad3601b92d9.npy'b'./challengeA_data/log_spectrogram/e757058c-91de-4381-8184-65f001c95647.npy'


b'./challengeA_data/log_spectrogram/38b12689-04ba-422b-a972-5856b05ca868.npy'
b'./challengeA_data/log_spectrogram/7c9ccc04-a2d2-4eec-bafd-0c97b3658c26.npy'b'./challengeA_data/log_spectrogram/c7cc3520-7218-4d07-9f0a-6bd7bb90a551.npy'



b'./challengeA_data/log_spectrogram/21f6060a-9766-4810-bd7c-0437f47ccb98.npy'

thanks！

您的逻辑似乎很好。实际上，您只是在观察

def read_npy_file(data): # 'data' stores the file name of the numpy binary file storing the features of a particular sound file # as a bytes string. # decode() is called on the bytes string to decode it from a bytes string to a regular string # so that it can passed as a parameter into np.load() print(data) data = np.load(data.decode()) return data.astype(np.float32) specgram_ds = tf.data.Dataset.from_tensor_slices((specgram_files, labels)) specgram_ds = specgram_ds.map( lambda file, label: tuple([tf.numpy_function(read_npy_file, [file], [tf.float32]), label]), num_parallel_calls=tf.data.AUTOTUNE) num_files = len(train_df) num_train = int(0.8 * num_files) num_val = int(0.1 * num_files) num_test = int(0.1 * num_files) specgram_ds = specgram_ds.shuffle(buffer_size=1000) specgram_train_ds = specgram_ds.take(num_train) specgram_test_ds = specgram_ds.skip(num_train) specgram_val_ds = specgram_test_ds.take(num_val) specgram_test_ds = specgram_test_ds.skip(num_val) # iterating over one item to trigger the mapping function for item in specgram_ds.take(1): pass

与

Answer 1

结合的行为。根据

docs

：

如果使用值tf.data.autotune，则基于可用的CPU，动态设置并行调用的数量。您可以几次运行以下代码来观察更改：

print(*)

也参见this

。最后，请注意，使用

import tensorflow as tf import numpy as np def read_npy_file(data): # 'data' stores the file name of the numpy binary file storing the features of a particular sound file # as a bytes string. # decode() is called on the bytes string to decode it from a bytes string to a regular string # so that it can passed as a parameter into np.load() print(data) data = np.load(data.decode()) return data.astype(np.float32) # Create dummy data for i in range(4): np.save('{}-array'.format(i), np.random.random((5,5))) specgram_files = ['/content/0-array.npy', '/content/1-array.npy', '/content/2-array.npy', '/content/3-array.npy'] labels = [1, 0, 0, 1] specgram_ds = tf.data.Dataset.from_tensor_slices((specgram_files, labels)) specgram_ds = specgram_ds.map( lambda file, label: tuple([tf.numpy_function(read_npy_file, [file], [tf.float32]), label]), num_parallel_calls=tf.data.AUTOTUNE) num_files = len(specgram_files) num_train = int(0.8 * num_files) num_val = int(0.1 * num_files) num_test = int(0.1 * num_files) specgram_ds = specgram_ds.shuffle(buffer_size=1000) specgram_train_ds = specgram_ds.take(num_train) specgram_test_ds = specgram_ds.skip(num_train) specgram_val_ds = specgram_test_ds.take(num_val) specgram_test_ds = specgram_test_ds.skip(num_val) for item in specgram_ds.take(1): pass

而不是

tf.print

应该摆脱任何side-side-septects

.。

曲折行为将函数传递到张量流数据集映射方法

问题描述投票：0回答：1

1个回答

最新问题

曲折行为将函数传递到张量流数据集映射方法

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1