我正在尝试在 CSV 中创建嵌入,希望有人可以提供帮助。我不确定我是否以正确的方式处理这个问题,但我将不胜感激任何帮助。当我尝试训练模型时,我得到:
ValueError:无法将 NumPy 数组转换为张量(不支持的对象类型 float)。
示例数据如下所示:
fld0,fld1,fld2,fld3,fld4,fld5,fld6,fld7,fld8,fld9,fld10,fld11,fld12,fld13,fld14,fld15,fld16,fld17,fld18,fld19,fld20,fld21,fld22,fld23,fld24,fld25,fld26,fld27,fld28,fld29,fld30,fld31,fld32,fld33,fld34,fld35,fld36,fld37,fld38,fld39,fld40,fld41,fld42,fld43,fld44,fld45,fld46,fld47,fld48,fld49,fld50,fld51,fld52,fld53,fld54,fld55,fld56,fld57,fld58,fld59,fld60,fld61,fld62,fld63,fld64,fld65,fld66,fld67,fld68,fld69,fld70,fld71,fld72,fld73,fld74,fld75,fld76,fld77,fld78,fld79,fld80,fld81,fld82,fld83,fld84,fld85,fld86,fld87,fld88,fld89,fld90,fld91,fld92,fld93,fld94,fld95,fld96,fld97,fld98,fld99,fld100,fld101,fld102,fld103,fld104,fld105,fld106,fld107,fld108,fld109,fld110,fld111,fld112,fld113,fld114,fld115,fld116,fld117,fld118,fld119,fld120,fld121,fld122,fld123,fld124,fld125,fld126,fld127,fld128,fld129,fld130,fld131,fld132,fld133
0.5713509314139188,1,1,0,0,1,0.49030538979462923,[ 0.0756 0.0756 0.1176 0.0672 0.0588 0.0756 0.0672 0.0504 0.0336 0.1008 0.0252 0.0252 0.0252 0.0672 0.0252 0.0252 0.0168 0.0084 0.0000 0.0000 0.0000 0.0084 0.0084 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0420 ],0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
我正在使用 pandas 2.2.3 和 Python 3.10.13。我四处寻找问题并找到了有关使用我已经完成的convert() 的建议。当我显示该列时,我认为它看起来是正确的。
我的代码是:
import tensorflow as tf
import pandas as pd
import numpy as np
def convert(item):
item = item[1:-1] # remove `[ ]`
item = item.strip() # remove spaces at the end
item = np.fromstring(item, sep=' ') # convert string to `numpy.array`
return item
print("TensorFlow Version: "+tf.__version__)
X_train = pd.read_csv('training.csv',converters={'fld7':convert})
y_train = pd.read_csv('training_labels.csv')
print(f"{X_train.shape=}")
print(f"{y_train.shape=}")
x_row=X_train.iloc[0]
x_val=x_row['fld7']
print(f"{type(x_row)=} {x_row=}")
print(f"{type(x_val)=} {x_val=}")
model = tf.keras.Sequential([
tf.keras.layers.Dense(1024, activation='relu'),
tf.keras.layers.Dense(2048, activation='relu'),
tf.keras.layers.Dense(2048, activation='relu'),
tf.keras.layers.Dense(1, activation='sigmoid')
])
model.compile(
loss=tf.keras.losses.binary_crossentropy,
optimizer=tf.keras.optimizers.Adam(learning_rate=0.003),
metrics=[
tf.keras.metrics.BinaryAccuracy(name='accuracy'),
tf.keras.metrics.Precision(name='precision'),
tf.keras.metrics.Recall(name='recall')
]
)
history = model.fit(X_train, y_train, epochs=25)
我运行程序时的输出是:
X_train.shape=(10, 134)
y_train.shape=(10, 1)
type(x_row)=<class 'pandas.core.series.Series'> x_row=fld0 0.571351
fld1 1
fld2 1
fld3 0
fld4 0
...
fld129 0
fld130 0
fld131 0
fld132 0
fld133 0
Name: 0, Length: 134, dtype: object
type(x_val)=<class 'numpy.ndarray'> x_val=array([0.0756, 0.0756, 0.1176, 0.0672, 0.0588, 0.0756, 0.0672, 0.0504,
0.0336, 0.1008, 0.0252, 0.0252, 0.0252, 0.0672, 0.0252, 0.0252,
0.0168, 0.0084, 0. , 0. , 0. , 0.0084, 0.0084, 0. ,
0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.042 ])
2024-10-08 20:14:52.337621: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2024-10-08 20:14:52.339066: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1929] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 20723 MB memory: -> device: 0, name: NVIDIA A10G, pci bus id: 0000:00:1e.0, compute capability: 8.6
2024-10-08 20:14:52.339066: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1929] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 20723 MB memory: -> device: 0, name: NVIDIA A10G, pci bus id: 0000:00:1e.0, compute capability: 8.6
Traceback (most recent call last):
File "/home/ubuntu/new_model/testcase.py", line 44, in <module>
history = model.fit(X_train, y_train, epochs=25)
File "/opt/tensorflow/lib/python3.10/site-packages/keras/src/utils/traceback_utils.py", line 70, in error_handler
raise e.with_traceback(filtered_tb) from None
File "/opt/tensorflow/lib/python3.10/site-packages/tensorflow/python/framework/constant_op.py", line 103, in convert_to_eager_tensor
return ops.EagerTensor(value, ctx.device_name, dtype)
ValueError: Failed to convert a NumPy array to a Tensor (Unsupported object type float).
错误消息非常简单。 “fld7”列的值是 NumPy 数组类型。 以下是我将有问题的列处理为新列的建议。
X_train = X_train.join( # joining new dataframe that been created with the transform of column 'fld7'
X_train.fld7.transform( # transforming the column array into dataframe columns with the respective value.
{
f'fld7_{i}': # generating the new column name
lambda x, j=i: # for each of the value index save it's index for the Lambda function
x[j] for i in range(len(X_train.fld7[0])) # assign the array value by its index
}
)).drop(columns='fld7') # dropping the array column.
请注意,我假设该列中的所有数组都具有相同的长度(40),否则每行的列数将不同。
结果:
fld0 fld1 fld2 fld3 fld4 fld5 fld6 fld8 fld9 fld10 ... fld7_30 fld7_31 fld7_32 fld7_33 fld7_34 fld7_35 fld7_36 fld7_37 fld7_38 fld7_39
0 0.571351 1 1 0 0 1 0.490305 0 0 0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.042