如何阅读 TensorFlow 的嵌入

问题描述 投票:0回答:1

我正在尝试在 CSV 中创建嵌入,希望有人可以提供帮助。我不确定我是否以正确的方式处理这个问题,但我将不胜感激任何帮助。当我尝试训练模型时,我得到:

ValueError:无法将 NumPy 数组转换为张量(不支持的对象类型 float)。

示例数据如下所示:

fld0,fld1,fld2,fld3,fld4,fld5,fld6,fld7,fld8,fld9,fld10,fld11,fld12,fld13,fld14,fld15,fld16,fld17,fld18,fld19,fld20,fld21,fld22,fld23,fld24,fld25,fld26,fld27,fld28,fld29,fld30,fld31,fld32,fld33,fld34,fld35,fld36,fld37,fld38,fld39,fld40,fld41,fld42,fld43,fld44,fld45,fld46,fld47,fld48,fld49,fld50,fld51,fld52,fld53,fld54,fld55,fld56,fld57,fld58,fld59,fld60,fld61,fld62,fld63,fld64,fld65,fld66,fld67,fld68,fld69,fld70,fld71,fld72,fld73,fld74,fld75,fld76,fld77,fld78,fld79,fld80,fld81,fld82,fld83,fld84,fld85,fld86,fld87,fld88,fld89,fld90,fld91,fld92,fld93,fld94,fld95,fld96,fld97,fld98,fld99,fld100,fld101,fld102,fld103,fld104,fld105,fld106,fld107,fld108,fld109,fld110,fld111,fld112,fld113,fld114,fld115,fld116,fld117,fld118,fld119,fld120,fld121,fld122,fld123,fld124,fld125,fld126,fld127,fld128,fld129,fld130,fld131,fld132,fld133
0.5713509314139188,1,1,0,0,1,0.49030538979462923,[ 0.0756 0.0756 0.1176 0.0672 0.0588 0.0756 0.0672 0.0504 0.0336 0.1008 0.0252 0.0252 0.0252 0.0672 0.0252 0.0252 0.0168 0.0084 0.0000 0.0000 0.0000 0.0084 0.0084 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0420 ],0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0

我正在使用 pandas 2.2.3 和 Python 3.10.13。我四处寻找问题并找到了有关使用我已经完成的convert() 的建议。当我显示该列时,我认为它看起来是正确的。

我的代码是:

import tensorflow as tf
import pandas as pd
import numpy as np


def convert(item):
    item = item[1:-1]    # remove `[ ]`
    item = item.strip()  # remove spaces at the end
    item = np.fromstring(item, sep=' ')  # convert string to `numpy.array`
    return item

print("TensorFlow Version: "+tf.__version__)

X_train = pd.read_csv('training.csv',converters={'fld7':convert})
y_train = pd.read_csv('training_labels.csv')
print(f"{X_train.shape=}")
print(f"{y_train.shape=}")

x_row=X_train.iloc[0]
x_val=x_row['fld7']
print(f"{type(x_row)=} {x_row=}")
print(f"{type(x_val)=} {x_val=}")

model = tf.keras.Sequential([
    tf.keras.layers.Dense(1024, activation='relu'),
    tf.keras.layers.Dense(2048, activation='relu'),
    tf.keras.layers.Dense(2048, activation='relu'),
    tf.keras.layers.Dense(1, activation='sigmoid')
])

model.compile(
    loss=tf.keras.losses.binary_crossentropy,
    optimizer=tf.keras.optimizers.Adam(learning_rate=0.003),
    metrics=[
        tf.keras.metrics.BinaryAccuracy(name='accuracy'),
        tf.keras.metrics.Precision(name='precision'),
        tf.keras.metrics.Recall(name='recall')
    ]
    )

history = model.fit(X_train, y_train, epochs=25)

我运行程序时的输出是:

X_train.shape=(10, 134)
y_train.shape=(10, 1)
type(x_row)=<class 'pandas.core.series.Series'> x_row=fld0      0.571351
fld1             1
fld2             1
fld3             0
fld4             0
            ...   
fld129           0
fld130           0
fld131           0
fld132           0
fld133           0
Name: 0, Length: 134, dtype: object
type(x_val)=<class 'numpy.ndarray'> x_val=array([0.0756, 0.0756, 0.1176, 0.0672, 0.0588, 0.0756, 0.0672, 0.0504,
       0.0336, 0.1008, 0.0252, 0.0252, 0.0252, 0.0672, 0.0252, 0.0252,
       0.0168, 0.0084, 0.    , 0.    , 0.    , 0.0084, 0.0084, 0.    ,
       0.    , 0.    , 0.    , 0.    , 0.    , 0.    , 0.    , 0.    ,
       0.    , 0.    , 0.    , 0.    , 0.    , 0.    , 0.    , 0.042 ])

2024-10-08 20:14:52.337621: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2024-10-08 20:14:52.339066: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1929] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 20723 MB memory:  -> device: 0, name: NVIDIA A10G, pci bus id: 0000:00:1e.0, compute capability: 8.6
2024-10-08 20:14:52.339066: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1929] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 20723 MB memory:  -> device: 0, name: NVIDIA A10G, pci bus id: 0000:00:1e.0, compute capability: 8.6
Traceback (most recent call last):
  File "/home/ubuntu/new_model/testcase.py", line 44, in <module>
    history = model.fit(X_train, y_train, epochs=25)
  File "/opt/tensorflow/lib/python3.10/site-packages/keras/src/utils/traceback_utils.py", line 70, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "/opt/tensorflow/lib/python3.10/site-packages/tensorflow/python/framework/constant_op.py", line 103, in convert_to_eager_tensor
    return ops.EagerTensor(value, ctx.device_name, dtype)
ValueError: Failed to convert a NumPy array to a Tensor (Unsupported object type float).
pandas dataframe tensorflow
1个回答
0
投票

错误消息非常简单。 “fld7”列的值是 NumPy 数组类型。 以下是我将有问题的列处理为新列的建议。

X_train = X_train.join(  # joining new dataframe that been created with the transform of column 'fld7'
    X_train.fld7.transform(  # transforming the column array into dataframe columns with the respective value.
        {
            f'fld7_{i}':  # generating the new column name
            lambda x, j=i:  # for each of the value index save it's index for the Lambda function
            x[j] for i in range(len(X_train.fld7[0]))  # assign the array value by its index
        }
    )).drop(columns='fld7')  # dropping the array column.

请注意,我假设该列中的所有数组都具有相同的长度(40),否则每行的列数将不同。

结果

    fld0    fld1    fld2    fld3    fld4    fld5    fld6    fld8    fld9    fld10   ... fld7_30 fld7_31 fld7_32 fld7_33 fld7_34 fld7_35 fld7_36 fld7_37 fld7_38 fld7_39
0   0.571351    1   1   0   0   1   0.490305    0   0   0   ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.042
© www.soinside.com 2019 - 2024. All rights reserved.