我正在使用
========================
我有两个目录,
train_data_npy
和valid_data_npy
,其中分别有3013个和1506个*.npy
文件。
每个
*.npy
文件有12列float类型,其中前九列是特征,后三列是三个类别的one-hot编码标签。
以下Python脚本的任务是分块加载这些
*.npy
文件,以便在搜索神经网络模型时内存不会溢出。
但是,脚本失败了。
给定脚本到底有什么问题?
为什么脚本失败?
或者,不是脚本的问题,而是CUDA、TF、AutoKeras的安装问题?
# File: cnn_search_by_chunk.py
import numpy as np
import tensorflow as tf
import os
import autokeras as ak
N_FEATURES = 9
BATCH_SIZE = 100
def get_data_generator(folder_path, batch_size, n_features):
"""Get a generator returning batches of data from .npy files in the specified folder.
The shape of the features is (batch_size, n_features).
"""
def data_generator():
files = os.listdir(folder_path)
npy_files = [f for f in files if f.endswith('.npy')]
for npy_file in npy_files:
data = np.load(os.path.join(folder_path, npy_file))
x = data[:, :n_features]
y = data[:, n_features:]
y = np.argmax(y, axis=1) # Convert one-hot-encoded labels back to integers
for i in range(0, len(x), batch_size):
yield x[i:i+batch_size], y[i:i+batch_size]
return data_generator
train_data_folder = '/home/my_user_name/original_data/train_data_npy'
validation_data_folder = '/home/my_user_name/original_data/valid_data_npy'
train_dataset = tf.data.Dataset.from_generator(
get_data_generator(train_data_folder, BATCH_SIZE, N_FEATURES),
output_signature=(
tf.TensorSpec(shape=(None, N_FEATURES), dtype=tf.float32),
tf.TensorSpec(shape=(None,), dtype=tf.int32) # Labels are now 1D integers
)
)
validation_dataset = tf.data.Dataset.from_generator(
get_data_generator(validation_data_folder, BATCH_SIZE, N_FEATURES),
output_signature=(
tf.TensorSpec(shape=(None, N_FEATURES), dtype=tf.float32),
tf.TensorSpec(shape=(None,), dtype=tf.int32) # Labels are now 1D integers
)
)
clf = ak.StructuredDataClassifier(overwrite=True, max_trials=1, seed=5)
clf.fit(x=train_dataset, validation_data=validation_dataset, batch_size=BATCH_SIZE)
print(clf.evaluate(validation_dataset))
my_user_name@192:~/my_project_name_v2$ python3 cnn_search_by_chunk.py
2023-11-29 20:05:53.532005: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
Using TensorFlow backend
2023-11-29 20:05:55.467804: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1960] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
Search: Running Trial #1
Hyperparameter |Value |Best Value So Far
structured_data...|True |?
structured_data...|2 |?
structured_data...|False |?
structured_data...|0 |?
structured_data...|32 |?
structured_data...|32 |?
classification_...|0 |?
optimizer |adam |?
learning_rate |0.001 |?
Epoch 1/1000
33143/33143 [==============================] - 149s 4ms/step - loss: 0.0670 - accuracy: 0.9677 - val_loss: 0.0612 - val_accuracy: 0.9708
Epoch 2/1000
33143/33143 [==============================] - 146s 4ms/step - loss: 0.0625 - accuracy: 0.9697 - val_loss: 0.0598 - val_accuracy: 0.9715
Epoch 3/1000
33143/33143 [==============================] - 147s 4ms/step - loss: 0.0617 - accuracy: 0.9702 - val_loss: 0.0593 - val_accuracy: 0.9717
Epoch 4/1000
33143/33143 [==============================] - 146s 4ms/step - loss: 0.0614 - accuracy: 0.9703 - val_loss: 0.0591 - val_accuracy: 0.9718
Epoch 5/1000
33143/33143 [==============================] - 147s 4ms/step - loss: 0.0612 - accuracy: 0.9705 - val_loss: 0.0590 - val_accuracy: 0.9719
Epoch 6/1000
33143/33143 [==============================] - 145s 4ms/step - loss: 0.0610 - accuracy: 0.9707 - val_loss: 0.0588 - val_accuracy: 0.9721
Epoch 7/1000
33143/33143 [==============================] - 147s 4ms/step - loss: 0.0608 - accuracy: 0.9707 - val_loss: 0.0586 - val_accuracy: 0.9721
Epoch 8/1000
33143/33143 [==============================] - 147s 4ms/step - loss: 0.0607 - accuracy: 0.9709 - val_loss: 0.0585 - val_accuracy: 0.9723
Epoch 9/1000
33143/33143 [==============================] - 146s 4ms/step - loss: 0.0605 - accuracy: 0.9710 - val_loss: 0.0584 - val_accuracy: 0.9723
Epoch 10/1000
33143/33143 [==============================] - 146s 4ms/step - loss: 0.0604 - accuracy: 0.9710 - val_loss: 0.0583 - val_accuracy: 0.9724
Epoch 11/1000
33143/33143 [==============================] - 148s 4ms/step - loss: 0.0603 - accuracy: 0.9711 - val_loss: 0.0583 - val_accuracy: 0.9724
Epoch 12/1000
33143/33143 [==============================] - 146s 4ms/step - loss: 0.0602 - accuracy: 0.9712 - val_loss: 0.0582 - val_accuracy: 0.9724
Epoch 13/1000
33143/33143 [==============================] - 147s 4ms/step - loss: 0.0601 - accuracy: 0.9712 - val_loss: 0.0582 - val_accuracy: 0.9724
Epoch 14/1000
33143/33143 [==============================] - 148s 4ms/step - loss: 0.0601 - accuracy: 0.9712 - val_loss: 0.0582 - val_accuracy: 0.9724
Epoch 15/1000
33143/33143 [==============================] - 146s 4ms/step - loss: 0.0600 - accuracy: 0.9713 - val_loss: 0.0582 - val_accuracy: 0.9724
Epoch 16/1000
33143/33143 [==============================] - 146s 4ms/step - loss: 0.0600 - accuracy: 0.9713 - val_loss: 0.0581 - val_accuracy: 0.9725
Epoch 17/1000
33143/33143 [==============================] - 147s 4ms/step - loss: 0.0600 - accuracy: 0.9713 - val_loss: 0.0581 - val_accuracy: 0.9725
Epoch 18/1000
33143/33143 [==============================] - 147s 4ms/step - loss: 0.0599 - accuracy: 0.9713 - val_loss: 0.0582 - val_accuracy: 0.9724
Epoch 19/1000
33143/33143 [==============================] - 145s 4ms/step - loss: 0.0599 - accuracy: 0.9713 - val_loss: 0.0581 - val_accuracy: 0.9724
Epoch 20/1000
33143/33143 [==============================] - 144s 4ms/step - loss: 0.0599 - accuracy: 0.9713 - val_loss: 0.0582 - val_accuracy: 0.9724
Epoch 21/1000
33143/33143 [==============================] - 147s 4ms/step - loss: 0.0599 - accuracy: 0.9713 - val_loss: 0.0582 - val_accuracy: 0.9724
Epoch 22/1000
33143/33143 [==============================] - 144s 4ms/step - loss: 0.0599 - accuracy: 0.9713 - val_loss: 0.0581 - val_accuracy: 0.9724
Epoch 23/1000
33143/33143 [==============================] - 146s 4ms/step - loss: 0.0600 - accuracy: 0.9713 - val_loss: 0.0582 - val_accuracy: 0.9724
Epoch 24/1000
33143/33143 [==============================] - 145s 4ms/step - loss: 0.0599 - accuracy: 0.9714 - val_loss: 0.0581 - val_accuracy: 0.9725
Epoch 25/1000
33143/33143 [==============================] - 147s 4ms/step - loss: 0.0599 - accuracy: 0.9714 - val_loss: 0.0581 - val_accuracy: 0.9724
Epoch 26/1000
33143/33143 [==============================] - 147s 4ms/step - loss: 0.0599 - accuracy: 0.9713 - val_loss: 0.0581 - val_accuracy: 0.9724
Trial 1 Complete [01h 16m 38s]
val_accuracy: 0.9724819660186768
Best val_accuracy So Far: 0.9724819660186768
Total elapsed time: 01h 16m 38s
WARNING:tensorflow:Detecting that an object or model or tf.train.Checkpoint is being deleted with unrestored values. See the following logs for the specific values in question. To silence these warnings, use `status.expect_partial()`. See https://www.tensorflow.org/api_docs/python/tf/train/Checkpoint#restorefor details about the status object returned by the restore function.
WARNING:tensorflow:Detecting that an object or model or tf.train.Checkpoint is being deleted with unrestored values. See the following logs for the specific values in question. To silence these warnings, use `status.expect_partial()`. See https://www.tensorflow.org/api_docs/python/tf/train/Checkpoint#restorefor details about the status object returned by the restore function.
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).optimizer._variables.1
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).optimizer._variables.1
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).optimizer._variables.2
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).optimizer._variables.2
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).optimizer._variables.3
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).optimizer._variables.3
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).optimizer._variables.4
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).optimizer._variables.4
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).optimizer._variables.5
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).optimizer._variables.5
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).optimizer._variables.6
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).optimizer._variables.6
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).optimizer._variables.7
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).optimizer._variables.7
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).optimizer._variables.8
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).optimizer._variables.8
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).optimizer._variables.9
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).optimizer._variables.9
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).optimizer._variables.10
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).optimizer._variables.10
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).optimizer._variables.11
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).optimizer._variables.11
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).optimizer._variables.12
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).optimizer._variables.12
2023-11-29 21:23:57.450991: W tensorflow/core/framework/op_kernel.cc:1828] OP_REQUIRES failed at lookup_table_op.cc:929 : FAILED_PRECONDITION: Table not initialized.
2023-11-29 21:23:57.451029: W tensorflow/core/framework/op_kernel.cc:1828] OP_REQUIRES failed at lookup_table_op.cc:929 : FAILED_PRECONDITION: Table not initialized.
2023-11-29 21:23:57.451059: W tensorflow/core/framework/op_kernel.cc:1828] OP_REQUIRES failed at lookup_table_op.cc:929 : FAILED_PRECONDITION: Table not initialized.
2023-11-29 21:23:57.451091: W tensorflow/core/framework/op_kernel.cc:1828] OP_REQUIRES failed at lookup_table_op.cc:929 : FAILED_PRECONDITION: Table not initialized.
2023-11-29 21:23:57.451123: W tensorflow/core/framework/op_kernel.cc:1828] OP_REQUIRES failed at lookup_table_op.cc:929 : FAILED_PRECONDITION: Table not initialized.
2023-11-29 21:23:57.451157: W tensorflow/core/framework/op_kernel.cc:1828] OP_REQUIRES failed at lookup_table_op.cc:929 : FAILED_PRECONDITION: Table not initialized.
2023-11-29 21:23:57.451185: W tensorflow/core/framework/op_kernel.cc:1828] OP_REQUIRES failed at lookup_table_op.cc:929 : FAILED_PRECONDITION: Table not initialized.
2023-11-29 21:23:57.451213: W tensorflow/core/framework/op_kernel.cc:1828] OP_REQUIRES failed at lookup_table_op.cc:929 : FAILED_PRECONDITION: Table not initialized.
2023-11-29 21:23:57.451250: W tensorflow/core/framework/op_kernel.cc:1828] OP_REQUIRES failed at lookup_table_op.cc:929 : FAILED_PRECONDITION: Table not initialized.
Traceback (most recent call last):
File "cnn_search_by_chunk.py", line 50, in <module>
print(clf.evaluate(validation_dataset))
File "/home/my_user_name/.local/lib/python3.8/site-packages/autokeras/tasks/structured_data.py", line 187, in evaluate
return super().evaluate(x=x, y=y, **kwargs)
File "/home/my_user_name/.local/lib/python3.8/site-packages/autokeras/auto_model.py", line 492, in evaluate
return utils.evaluate_with_adaptive_batch_size(
File "/home/my_user_name/.local/lib/python3.8/site-packages/autokeras/utils/utils.py", line 68, in evaluate_with_adaptive_batch_size
return run_with_adaptive_batch_size(
File "/home/my_user_name/.local/lib/python3.8/site-packages/autokeras/utils/utils.py", line 101, in run_with_adaptive_batch_size
history = func(x=x, validation_data=validation_data, **fit_kwargs)
File "/home/my_user_name/.local/lib/python3.8/site-packages/autokeras/utils/utils.py", line 70, in <lambda>
lambda x, validation_data, **kwargs: model.evaluate(
File "/home/my_user_name/.local/lib/python3.8/site-packages/keras/src/utils/traceback_utils.py", line 70, in error_handler
raise e.with_traceback(filtered_tb) from None
File "/home/my_user_name/.local/lib/python3.8/site-packages/tensorflow/python/eager/execute.py", line 53, in quick_execute
tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.FailedPreconditionError: Graph execution error:
Detected at node 'model/multi_category_encoding/string_lookup_15/None_Lookup/LookupTableFindV2' defined at (most recent call last):
File "cnn_search_by_chunk.py", line 50, in <module>
print(clf.evaluate(validation_dataset))
File "/home/my_user_name/.local/lib/python3.8/site-packages/autokeras/tasks/structured_data.py", line 187, in evaluate
return super().evaluate(x=x, y=y, **kwargs)
File "/home/my_user_name/.local/lib/python3.8/site-packages/autokeras/auto_model.py", line 492, in evaluate
return utils.evaluate_with_adaptive_batch_size(
File "/home/my_user_name/.local/lib/python3.8/site-packages/autokeras/utils/utils.py", line 68, in evaluate_with_adaptive_batch_size
return run_with_adaptive_batch_size(
File "/home/my_user_name/.local/lib/python3.8/site-packages/autokeras/utils/utils.py", line 101, in run_with_adaptive_batch_size
history = func(x=x, validation_data=validation_data, **fit_kwargs)
File "/home/my_user_name/.local/lib/python3.8/site-packages/autokeras/utils/utils.py", line 70, in <lambda>
lambda x, validation_data, **kwargs: model.evaluate(
File "/home/my_user_name/.local/lib/python3.8/site-packages/keras/src/utils/traceback_utils.py", line 65, in error_handler
return fn(*args, **kwargs)
File "/home/my_user_name/.local/lib/python3.8/site-packages/keras/src/engine/training.py", line 2200, in evaluate
logs = test_function_runner.run_step(
File "/home/my_user_name/.local/lib/python3.8/site-packages/keras/src/engine/training.py", line 4000, in run_step
tmp_logs = self._function(dataset_or_iterator)
File "/home/my_user_name/.local/lib/python3.8/site-packages/keras/src/engine/training.py", line 1972, in test_function
return step_function(self, iterator)
File "/home/my_user_name/.local/lib/python3.8/site-packages/keras/src/engine/training.py", line 1956, in step_function
outputs = model.distribute_strategy.run(run_step, args=(data,))
File "/home/my_user_name/.local/lib/python3.8/site-packages/keras/src/engine/training.py", line 1944, in run_step
outputs = model.test_step(data)
File "/home/my_user_name/.local/lib/python3.8/site-packages/keras/src/engine/training.py", line 1850, in test_step
y_pred = self(x, training=False)
File "/home/my_user_name/.local/lib/python3.8/site-packages/keras/src/utils/traceback_utils.py", line 65, in error_handler
return fn(*args, **kwargs)
File "/home/my_user_name/.local/lib/python3.8/site-packages/keras/src/engine/training.py", line 569, in __call__
return super().__call__(*args, **kwargs)
File "/home/my_user_name/.local/lib/python3.8/site-packages/keras/src/utils/traceback_utils.py", line 65, in error_handler
return fn(*args, **kwargs)
File "/home/my_user_name/.local/lib/python3.8/site-packages/keras/src/engine/base_layer.py", line 1150, in __call__
outputs = call_fn(inputs, *args, **kwargs)
File "/home/my_user_name/.local/lib/python3.8/site-packages/keras/src/utils/traceback_utils.py", line 96, in error_handler
return fn(*args, **kwargs)
File "/home/my_user_name/.local/lib/python3.8/site-packages/keras/src/engine/functional.py", line 512, in call
return self._run_internal_graph(inputs, training=training, mask=mask)
File "/home/my_user_name/.local/lib/python3.8/site-packages/keras/src/engine/functional.py", line 669, in _run_internal_graph
outputs = node.layer(*args, **kwargs)
File "/home/my_user_name/.local/lib/python3.8/site-packages/keras/src/utils/traceback_utils.py", line 65, in error_handler
return fn(*args, **kwargs)
File "/home/my_user_name/.local/lib/python3.8/site-packages/keras/src/engine/base_layer.py", line 1150, in __call__
outputs = call_fn(inputs, *args, **kwargs)
File "/home/my_user_name/.local/lib/python3.8/site-packages/keras/src/utils/traceback_utils.py", line 96, in error_handler
return fn(*args, **kwargs)
File "/home/my_user_name/.local/lib/python3.8/site-packages/autokeras/keras_layers.py", line 91, in call
for input_node, encoding_layer in zip(split_inputs, self.encoding_layers):
File "/home/my_user_name/.local/lib/python3.8/site-packages/autokeras/keras_layers.py", line 92, in call
if encoding_layer is None:
File "/home/my_user_name/.local/lib/python3.8/site-packages/autokeras/keras_layers.py", line 100, in call
output_nodes.append(
File "/home/my_user_name/.local/lib/python3.8/site-packages/keras/src/utils/traceback_utils.py", line 65, in error_handler
return fn(*args, **kwargs)
File "/home/my_user_name/.local/lib/python3.8/site-packages/keras/src/engine/base_layer.py", line 1150, in __call__
outputs = call_fn(inputs, *args, **kwargs)
File "/home/my_user_name/.local/lib/python3.8/site-packages/keras/src/utils/traceback_utils.py", line 96, in error_handler
return fn(*args, **kwargs)
File "/home/my_user_name/.local/lib/python3.8/site-packages/keras/src/layers/preprocessing/index_lookup.py", line 756, in call
lookups = self._lookup_dense(inputs)
File "/home/my_user_name/.local/lib/python3.8/site-packages/keras/src/layers/preprocessing/index_lookup.py", line 792, in _lookup_dense
lookups = self.lookup_table.lookup(inputs)
Node: 'model/multi_category_encoding/string_lookup_15/None_Lookup/LookupTableFindV2'
Table not initialized.
[[{{node model/multi_category_encoding/string_lookup_15/None_Lookup/LookupTableFindV2}}]] [Op:__inference_test_function_5785123]
2023-11-29 21:23:57.618149: W tensorflow/core/kernels/data/generator_dataset_op.cc:108] Error occurred when finalizing GeneratorDataset iterator: FAILED_PRECONDITION: Python interpreter state is not initialized. The process may be terminated.
[[{{node PyFunc}}]]
2023-11-29 21:23:57.618266: W tensorflow/core/kernels/data/generator_dataset_op.cc:108] Error occurred when finalizing GeneratorDataset iterator: FAILED_PRECONDITION: Python interpreter state is not initialized. The process may be terminated.
[[{{node PyFunc}}]]
2023-11-29 21:23:57.618360: W tensorflow/core/kernels/data/generator_dataset_op.cc:108] Error occurred when finalizing GeneratorDataset iterator: FAILED_PRECONDITION: Python interpreter state is not initialized. The process may be terminated.
[[{{node PyFunc}}]]
2023-11-29 21:23:57.618434: W tensorflow/core/kernels/data/generator_dataset_op.cc:108] Error occurred when finalizing GeneratorDataset iterator: FAILED_PRECONDITION: Python interpreter state is not initialized. The process may be terminated.
[[{{node PyFunc}}]]
my_user_name@192:~/my_project_name_v2$
查看错误日志,这要么是您的数据,要么是您的 GPU 设置和安装。
GPU 库警告:无法 dlopen 某些 GPU 库
2023-11-29 20:05:55.467804: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1960] 无法 dlopen 某些 GPU 库。
此警告表明 TensorFlow 在加载某些 GPU 库时遇到问题。确保安装了必要的 GPU 驱动程序和库。有关设置 GPU 支持的详细信息,您可以参考 TensorFlow GPU 官方安装指南:TensorFlow GPU 支持指南。
2023-11-29 21:23:57.451123:W tensorflow/core/framework/op_kernel.cc:1828] OP_REQUIRES 在lookup_table_op.cc:929 处失败:FAILED_PRECONDITION:表未初始化。
模型的定义方式或数据的处理方式可能存在问题。从评论中我判断模型定义正确,但是如果无法访问您的数据,我们只能搭建脚手架(正如 MNIST 成功完成的那样)。解决此问题的可能步骤:检查输入数据以确保模型的格式正确。