曲折行为将函数传递到张量流数据集映射方法

问题描述 投票:0回答:1

今天早些时候,这对我来说很好,但是当我重新启动笔记本时,它突然开始非常奇怪。 我有一个TF数据集,该数据集将Numpy文件及其相应的标签作为输入,例如So so。 当我使用1项使用

tf.data.Dataset.from_tensor_slices((specgram_files, labels))
时,我会得到预期的输出,即张量的元组,其中第一个张量包含numpy文件的名称作为字节字符串,第二个张量包含编码的标签。 然后,我有一个函数,可以使用
for item in ds.take(1): print(item)
读取文件并产生一个numpy数组,然后将其返回。此功能传递到Map()方法中,看起来像这样:
np.load()

,read_npy_file看起来像这样:

ds = ds.map( lambda file, label: tuple([tf.numpy_function(read_npy_file, [file], [tf.float32]), label]), num_parallel_calls=tf.data.AUTOTUNE)

您可以看到,映射应创建另一个张量的元组,其中第一个张量是numpy阵列,而第二个张量为标签,未触及。这很早就起作用了,但是现在它给出了最奇怪的行为。我将打印语句放置在
def read_npy_file(data): # 'data' stores the file name of the numpy binary file storing the features of a particular sound file # as a bytes string. # decode() is called on the bytes string to decode it from a bytes string to a regular string # so that it can passed as a parameter into np.load() data = np.load(data.decode()) return data.astype(np.float32)

函数中,以查看是否传递了正确的数据。我希望它通过一个字符串,但是当我在

read_npy_file()
函数中调用
print(data)
并从数据集中拨打1个项目时,它会产生此输出,以使用
read_npy_file()
触发一个映射:

ds.take(1)

我没有修改输出的任何格式。 我非常感谢任何帮助。与哈哈一起工作绝对是一场噩梦。 there是完整的代码

b'./challengeA_data/log_spectrogram/2603ebb3-3cd3-43cc-98ef-0c128c515863.npy'b'./challengeA_data/log_spectrogram/fab6a266-e97a-4935-a0c3-444fc4426fc5.npy'b'./challengeA_data/log_spectrogram/93014682-60a2-45bd-9c9e-7f3c97b83be9.npy'b'./challengeA_data/log_spectrogram/710f2430-5da3-4822-a252-6ad3601b92d9.npy'b'./challengeA_data/log_spectrogram/e757058c-91de-4381-8184-65f001c95647.npy' b'./challengeA_data/log_spectrogram/38b12689-04ba-422b-a972-5856b05ca868.npy' b'./challengeA_data/log_spectrogram/7c9ccc04-a2d2-4eec-bafd-0c97b3658c26.npy'b'./challengeA_data/log_spectrogram/c7cc3520-7218-4d07-9f0a-6bd7bb90a551.npy' b'./challengeA_data/log_spectrogram/21f6060a-9766-4810-bd7c-0437f47ccb98.npy'

thanks!

	
您的逻辑似乎很好。实际上,您只是在观察

def read_npy_file(data): # 'data' stores the file name of the numpy binary file storing the features of a particular sound file # as a bytes string. # decode() is called on the bytes string to decode it from a bytes string to a regular string # so that it can passed as a parameter into np.load() print(data) data = np.load(data.decode()) return data.astype(np.float32) specgram_ds = tf.data.Dataset.from_tensor_slices((specgram_files, labels)) specgram_ds = specgram_ds.map( lambda file, label: tuple([tf.numpy_function(read_npy_file, [file], [tf.float32]), label]), num_parallel_calls=tf.data.AUTOTUNE) num_files = len(train_df) num_train = int(0.8 * num_files) num_val = int(0.1 * num_files) num_test = int(0.1 * num_files) specgram_ds = specgram_ds.shuffle(buffer_size=1000) specgram_train_ds = specgram_ds.take(num_train) specgram_test_ds = specgram_ds.skip(num_train) specgram_val_ds = specgram_test_ds.take(num_val) specgram_test_ds = specgram_test_ds.skip(num_val) # iterating over one item to trigger the mapping function for item in specgram_ds.take(1): pass

python tensorflow machine-learning deep-learning tensorflow-datasets
1个回答
2
投票
结合的行为。根据

docs

如果使用值tf.data.autotune,则基于可用的CPU,动态设置并行调用的数量。

您可以几次运行以下代码来观察更改:

print(*)

也参见this
。最后,请注意,使用

import tensorflow as tf import numpy as np def read_npy_file(data): # 'data' stores the file name of the numpy binary file storing the features of a particular sound file # as a bytes string. # decode() is called on the bytes string to decode it from a bytes string to a regular string # so that it can passed as a parameter into np.load() print(data) data = np.load(data.decode()) return data.astype(np.float32) # Create dummy data for i in range(4): np.save('{}-array'.format(i), np.random.random((5,5))) specgram_files = ['/content/0-array.npy', '/content/1-array.npy', '/content/2-array.npy', '/content/3-array.npy'] labels = [1, 0, 0, 1] specgram_ds = tf.data.Dataset.from_tensor_slices((specgram_files, labels)) specgram_ds = specgram_ds.map( lambda file, label: tuple([tf.numpy_function(read_npy_file, [file], [tf.float32]), label]), num_parallel_calls=tf.data.AUTOTUNE) num_files = len(specgram_files) num_train = int(0.8 * num_files) num_val = int(0.1 * num_files) num_test = int(0.1 * num_files) specgram_ds = specgram_ds.shuffle(buffer_size=1000) specgram_train_ds = specgram_ds.take(num_train) specgram_test_ds = specgram_ds.skip(num_train) specgram_val_ds = specgram_test_ds.take(num_val) specgram_test_ds = specgram_test_ds.skip(num_val) for item in specgram_ds.take(1): pass

而不是
tf.print
应该摆脱任何side-side-septects

.


最新问题
© www.soinside.com 2019 - 2025. All rights reserved.