我不明白我的错误从何而来。我查看了各种来源,但错误总是相同的......
我有以下错误: UnimplementedError:不支持将字符串转换为浮点数
我发现了这个类似的问题:链接 - stackoverflow 我不知道如何在我的代码上调整它......
我的代码基于以下教程:链接 - 原始教程
例如,我的 CSV 文件包含:
0 id : “azxje23d"
1 type : 2
2 raw : “PAYPAL - Netflix Subscription”
3 type_paiement : “PRELEVEMENT
4 label : “Netflix Abonnement”
5 amount : 22,4
6 Category : “Subscriptions"
代码片段:
df = pd.read_csv('based-systems-recommandation/MappingTransactions.csv', on_bad_lines='skip', sep=';', skiprows=1, names=['id', 'date', 'rdate', 'vdate', 'type', 'raw', 'type_paiement', 'label', 'amount', 'Income', 'Expense', 'balance', 'Readable label', 'Account','Company', 'Category', 'Share','Primary', 'planned', 'State'])
df = df.drop(["date", "rdate", "vdate", "Expense","Income","balance","Readable label","Readable label","Account","Company","Share","Primary","planned","State"], axis = 1)
df["amount"] = [float(str(i).replace(",", ".")) for i in df["amount"]]
# Check the data type of the columns and if data is missing
df.info()
# Split data into training and validation dataframes using sample() method from Pandas
train_dataframe=df.sample(frac=0.8,random_state=200)
val_dataframe=df.drop(train_dataframe.index)
print(
"Using %d samples for training and %d for validation"
% (len(train_dataframe), len(val_dataframe))
)
# Convert data from Pandas dataframe to Tensorflow dataset format
def dataframe_to_dataset(df):
df = df.copy()
labels=df.pop("Category")
ds = tf.data.Dataset.from_tensor_slices((dict(df), labels))
print(ds)
return ds
train_ds = dataframe_to_dataset(train_dataframe)
val_ds = dataframe_to_dataset(val_dataframe)
#print(train_ds)
#print(val_ds)
# Create the batches of the Tensorflow Data
train_ds = train_ds.batch(5) ## For training
val_ds = val_ds.batch(5) ## For validation
# Have a look at the first batch
dataset_filter = train_ds.take(1)
result = list(dataset_filter.as_numpy_iterator())
# We have to create different layers to process the raw input.
# The first layer required is the input layer.
id = tf.keras.Input(shape=(1,), name="id", dtype="string")
raw = tf.keras.Input(shape=(1,), name="raw", dtype="string")
type_paiement = tf.keras.Input(shape=(1,), name="type_paiement", dtype="string")
label = tf.keras.Input(shape=(1,), name="label", dtype="string")
type = tf.keras.Input(shape=(1,), name="type", dtype='int64')
amount = tf.keras.Input(shape=(1,), name="amount", dtype='float32')
# Joining input blocks in an array
all_inputs=[
id,
raw,
type_paiement,
label,
type,
amount
]
# Next Layer, we define an encoder layer : converts categorical dara into one-hot encoding & normalizes numerical data
# We need to let it learn the vocabulary(range of input data) as a preprocessing step
# Create a lookup layer which will turn strings into integer indices
def encode_categorical_feature(feature, name, dataset):
... Original code
return encoder
# Create a lookup layer for numerical inputs encoder layer
def encode_numerical_feature(feature, name, dataset):
... Original code
return encoder
id_encoded = encode_categorical_feature(id,"id", train_ds)
raw_encoded = encode_categorical_feature(raw, "raw", train_ds)
type_paiement_encoded = encode_categorical_feature(type_paiement, "type_paiement", train_ds)
label_encoded = encode_categorical_feature(label, "label", train_ds)
type_encoded = encode_numerical_feature(type, "type", train_ds)
amount_encoded = encode_numerical_feature(amount, "amount", train_ds)
# Concatenate the Encoders to form the next layer (Encoder layer)
all_features = layers.concatenate(
[
id_encoded,
raw_encoded,
type_paiement_encoded,
label_encoded,
type_encoded,
amount_encoded
]
)
x= layers.Dense(64, activation="relu")(all_features)
x= layers.Dropout(0.5)(x)
output= layers.Dense(1, activation="sigmoid")(x)
model= tf.keras.Model(inputs=all_inputs, outputs=output)
model.compile("adam", "binary_crossentropy", metrics=["accuracy"])
model.summary()
tf.keras.utils.plot_model(model,show_shapes=True,rankdir='LR', to_file='model.png')
target_data = train_dataframe["Category"]
model.fit(train_ds, epochs=100, validation_data=val_ds)
错误:
Epoch 1/100
2024-05-29 22:36:03.489463: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:117] Plugin optimizer for device_type GPU is enabled.
2024-05-29 22:36:03.721447: W tensorflow/core/framework/op_kernel.cc:1816] OP_REQUIRES failed at cast_op.cc:122 : UNIMPLEMENTED: Cast string to float is not supported
Traceback (most recent call last):
File "/Users/user/Desktop/Finance LLMs Analyzer/based-systems-recommandation/mainv2.py", line 175, in <module>
model.fit(train_ds, epochs=100, validation_data=val_ds)
File "/Users/user/.pyenv/versions/3.10.14/lib/python3.10/site-packages/keras/src/utils/traceback_utils.py", line 122, in error_handler
raise e.with_traceback(filtered_tb) from None
File "/Users/user/.pyenv/versions/3.10.14/lib/python3.10/site-packages/tensorflow/python/eager/execute.py", line 53, in quick_execute
tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.UnimplementedError: Graph execution error:
Detected at node compile_loss/binary_crossentropy/Cast defined at (most recent call last):
<stack traces unavailable>
2 root error(s) found.
(0) UNIMPLEMENTED: Cast string to float is not supported
[[{{node compile_loss/binary_crossentropy/Cast}}]]
(1) CANCELLED: Function was cancelled before it was started
0 successful operations.
0 derived errors ignored. [Op:__inference_one_step_on_iterator_6128]
日志:
Data columns (total 7 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 id 312 non-null object
1 type 312 non-null int64
2 raw 312 non-null object
3 type_paiement 312 non-null object
4 label 312 non-null object
5 amount 312 non-null float64
6 Category 312 non-null object
dtypes: float64(1), int64(1), object(5)
memory usage: 17.2+ KB
Using 250 samples for training and 62 for validation
Model: "functional_1"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Layer (type) ┃ Output Shape ┃ Param # ┃ Connected to ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ id (InputLayer) │ (None, 1) │ 0 │ - │
├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤
│ raw (InputLayer) │ (None, 1) │ 0 │ - │
├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤
│ type_paiement (InputLayer) │ (None, 1) │ 0 │ - │
├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤
│ label (InputLayer) │ (None, 1) │ 0 │ - │
├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤
│ type (InputLayer) │ (None, 1) │ 0 │ - │
├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤
│ amount (InputLayer) │ (None, 1) │ 0 │ - │
├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤
│ string_lookup (StringLookup) │ (None, 251) │ 0 │ id[0][0] │
├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤
│ string_lookup_1 │ (None, 156) │ 0 │ raw[0][0] │
│ (StringLookup) │ │ │ │
├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤
│ string_lookup_2 │ (None, 6) │ 0 │ type_paiement[0][0] │
│ (StringLookup) │ │ │ │
├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤
│ string_lookup_3 │ (None, 114) │ 0 │ label[0][0] │
│ (StringLookup) │ │ │ │
├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤
│ normalization (Normalization) │ (None, 1) │ 3 │ type[0][0] │
├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤
│ normalization_1 │ (None, 1) │ 3 │ amount[0][0] │
│ (Normalization) │ │ │ │
├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤
│ concatenate (Concatenate) │ (None, 529) │ 0 │ string_lookup[0][0], │
│ │ │ │ string_lookup_1[0][0], │
│ │ │ │ string_lookup_2[0][0], │
│ │ │ │ string_lookup_3[0][0], │
│ │ │ │ normalization[0][0], │
│ │ │ │ normalization_1[0][0] │
├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤
│ dense (Dense) │ (None, 64) │ 33,920 │ concatenate[0][0] │
├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤
│ dropout (Dropout) │ (None, 64) │ 0 │ dense[0][0] │
├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤
│ dense_1 (Dense) │ (None, 1) │ 65 │ dropout[0][0] │
└───────────────────────────────┴───────────────────────────┴─────────────────┴────────────────────────────┘
Total params: 33,991 (132.79 KB)
Trainable params: 33,985 (132.75 KB)
Non-trainable params: 6 (32.00 B)
您使用的二元交叉熵损失函数期望标签是浮点数(0 到 1 之间),但是您提供的标签是字符串。您需要将字符串转换为浮点值。一种选择是将每个类别映射到数据管道中的特定数字。例如,您可以将
0.0
分配给 "Subscriptions"
,将 1.0
分配给其他类别。
请注意,您使用的损失函数是用于二元分类的。如果您有两个以上的类别,则需要进行一些修改。我建议阅读本教程,尤其是多类分类部分:基本文本分类