model.fit() 出现错误:未实现:不支持将字符串转换为浮点数

问题描述 投票:0回答:1

我不明白我的错误从何而来。我查看了各种来源,但错误总是相同的......

我有以下错误: UnimplementedError:不支持将字符串转换为浮点数

我发现了这个类似的问题:链接 - stackoverflow 我不知道如何在我的代码上调整它......

我的代码基于以下教程:链接 - 原始教程

例如,我的 CSV 文件包含:

 0 id : “azxje23d"
 1 type : 2
 2 raw : “PAYPAL - Netflix Subscription”
 3 type_paiement : “PRELEVEMENT
 4 label : “Netflix Abonnement”
 5 amount : 22,4
 6 Category : “Subscriptions"

代码片段:

df = pd.read_csv('based-systems-recommandation/MappingTransactions.csv', on_bad_lines='skip', sep=';', skiprows=1, names=['id', 'date', 'rdate', 'vdate', 'type', 'raw', 'type_paiement', 'label', 'amount', 'Income', 'Expense', 'balance', 'Readable label', 'Account','Company', 'Category', 'Share','Primary', 'planned', 'State'])
    df = df.drop(["date", "rdate", "vdate", "Expense","Income","balance","Readable label","Readable label","Account","Company","Share","Primary","planned","State"], axis = 1)
    
    df["amount"] = [float(str(i).replace(",", ".")) for i in df["amount"]]
    
    
    # Check the data type of the columns and if data is missing
    df.info()
    
    
    # Split data into training and validation dataframes using sample() method from Pandas
    
    train_dataframe=df.sample(frac=0.8,random_state=200)
    val_dataframe=df.drop(train_dataframe.index) 
    
    print(
        "Using %d samples for training and %d for validation"
        % (len(train_dataframe), len(val_dataframe))
    )
    
    # Convert data from Pandas dataframe to Tensorflow dataset format
    
    def dataframe_to_dataset(df):
        df = df.copy()
        labels=df.pop("Category")
        ds = tf.data.Dataset.from_tensor_slices((dict(df), labels))
        print(ds)
        return ds
    
    train_ds = dataframe_to_dataset(train_dataframe)
    val_ds = dataframe_to_dataset(val_dataframe)
    
    #print(train_ds)
    #print(val_ds)
    
    # Create the batches of the Tensorflow Data
    
    train_ds = train_ds.batch(5) ## For training
    val_ds = val_ds.batch(5) ## For validation
    
    # Have a look at the first batch
    
    dataset_filter = train_ds.take(1)
    result = list(dataset_filter.as_numpy_iterator())

    # We have to create different layers to process the raw input. 
    # The first layer required is the input layer.
    
    id = tf.keras.Input(shape=(1,), name="id", dtype="string")
    raw = tf.keras.Input(shape=(1,), name="raw", dtype="string")
    type_paiement = tf.keras.Input(shape=(1,), name="type_paiement", dtype="string")
    label = tf.keras.Input(shape=(1,), name="label", dtype="string")
    
    type = tf.keras.Input(shape=(1,), name="type", dtype='int64')
    amount = tf.keras.Input(shape=(1,), name="amount", dtype='float32')
    
    # Joining input blocks in an array
    
    all_inputs=[
        id, 
        raw, 
        type_paiement, 
        label, 
        type, 
        amount
    ]
    
    # Next Layer, we define an encoder layer : converts categorical dara into one-hot encoding & normalizes numerical data
    # We need to let it learn the vocabulary(range of input data) as a preprocessing step
    
    # Create a lookup layer which will turn strings into integer indices
    def encode_categorical_feature(feature, name, dataset):
        ... Original code
        return encoder
    
    # Create a lookup layer for numerical inputs encoder layer
    def encode_numerical_feature(feature, name, dataset):
        ... Original code
        return encoder
    
    id_encoded = encode_categorical_feature(id,"id", train_ds)
    raw_encoded = encode_categorical_feature(raw, "raw", train_ds)
    type_paiement_encoded = encode_categorical_feature(type_paiement, "type_paiement", train_ds)
    label_encoded = encode_categorical_feature(label, "label", train_ds)
    
    type_encoded = encode_numerical_feature(type, "type", train_ds)
    amount_encoded = encode_numerical_feature(amount, "amount", train_ds)
    
    # Concatenate the Encoders to form the next layer (Encoder layer)
    
    all_features = layers.concatenate(
        [
            id_encoded,
            raw_encoded,
            type_paiement_encoded,
            label_encoded,
            type_encoded,
            amount_encoded
        ]
    )
    
    x= layers.Dense(64, activation="relu")(all_features)
    x= layers.Dropout(0.5)(x)
    output= layers.Dense(1, activation="sigmoid")(x)
    model= tf.keras.Model(inputs=all_inputs, outputs=output)
    model.compile("adam", "binary_crossentropy", metrics=["accuracy"])
    model.summary()
    
    tf.keras.utils.plot_model(model,show_shapes=True,rankdir='LR', to_file='model.png')
    
    target_data = train_dataframe["Category"] 
    model.fit(train_ds, epochs=100, validation_data=val_ds)

错误:

Epoch 1/100
2024-05-29 22:36:03.489463: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:117] Plugin optimizer for device_type GPU is enabled.
2024-05-29 22:36:03.721447: W tensorflow/core/framework/op_kernel.cc:1816] OP_REQUIRES failed at cast_op.cc:122 : UNIMPLEMENTED: Cast string to float is not supported
Traceback (most recent call last):
  File "/Users/user/Desktop/Finance LLMs Analyzer/based-systems-recommandation/mainv2.py", line 175, in <module>
    model.fit(train_ds, epochs=100, validation_data=val_ds)
  File "/Users/user/.pyenv/versions/3.10.14/lib/python3.10/site-packages/keras/src/utils/traceback_utils.py", line 122, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "/Users/user/.pyenv/versions/3.10.14/lib/python3.10/site-packages/tensorflow/python/eager/execute.py", line 53, in quick_execute
    tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.UnimplementedError: Graph execution error:

Detected at node compile_loss/binary_crossentropy/Cast defined at (most recent call last):
<stack traces unavailable>
2 root error(s) found.
  (0) UNIMPLEMENTED:  Cast string to float is not supported
         [[{{node compile_loss/binary_crossentropy/Cast}}]]
  (1) CANCELLED:  Function was cancelled before it was started
0 successful operations.
0 derived errors ignored. [Op:__inference_one_step_on_iterator_6128]

日志:

Data columns (total 7 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   id             312 non-null    object 
 1   type           312 non-null    int64  
 2   raw            312 non-null    object 
 3   type_paiement  312 non-null    object 
 4   label          312 non-null    object 
 5   amount         312 non-null    float64
 6   Category       312 non-null    object 
dtypes: float64(1), int64(1), object(5)
memory usage: 17.2+ KB
Using 250 samples for training and 62 for validation
Model: "functional_1"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Layer (type)                  ┃ Output Shape              ┃         Param # ┃ Connected to               ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ id (InputLayer)               │ (None, 1)                 │               0 │ -                          │
├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤
│ raw (InputLayer)              │ (None, 1)                 │               0 │ -                          │
├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤
│ type_paiement (InputLayer)    │ (None, 1)                 │               0 │ -                          │
├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤
│ label (InputLayer)            │ (None, 1)                 │               0 │ -                          │
├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤
│ type (InputLayer)             │ (None, 1)                 │               0 │ -                          │
├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤
│ amount (InputLayer)           │ (None, 1)                 │               0 │ -                          │
├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤
│ string_lookup (StringLookup)  │ (None, 251)               │               0 │ id[0][0]                   │
├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤
│ string_lookup_1               │ (None, 156)               │               0 │ raw[0][0]                  │
│ (StringLookup)                │                           │                 │                            │
├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤
│ string_lookup_2               │ (None, 6)                 │               0 │ type_paiement[0][0]        │
│ (StringLookup)                │                           │                 │                            │
├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤
│ string_lookup_3               │ (None, 114)               │               0 │ label[0][0]                │
│ (StringLookup)                │                           │                 │                            │
├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤
│ normalization (Normalization) │ (None, 1)                 │               3 │ type[0][0]                 │
├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤
│ normalization_1               │ (None, 1)                 │               3 │ amount[0][0]               │
│ (Normalization)               │                           │                 │                            │
├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤
│ concatenate (Concatenate)     │ (None, 529)               │               0 │ string_lookup[0][0],       │
│                               │                           │                 │ string_lookup_1[0][0],     │
│                               │                           │                 │ string_lookup_2[0][0],     │
│                               │                           │                 │ string_lookup_3[0][0],     │
│                               │                           │                 │ normalization[0][0],       │
│                               │                           │                 │ normalization_1[0][0]      │
├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤
│ dense (Dense)                 │ (None, 64)                │          33,920 │ concatenate[0][0]          │
├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤
│ dropout (Dropout)             │ (None, 64)                │               0 │ dense[0][0]                │
├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤
│ dense_1 (Dense)               │ (None, 1)                 │              65 │ dropout[0][0]              │
└───────────────────────────────┴───────────────────────────┴─────────────────┴────────────────────────────┘
 Total params: 33,991 (132.79 KB)
 Trainable params: 33,985 (132.75 KB)
 Non-trainable params: 6 (32.00 B)
tensorflow keras tf.keras
1个回答
0
投票

您使用的二元交叉熵损失函数期望标签是浮点数(0 到 1 之间),但是您提供的标签是字符串。您需要将字符串转换为浮点值。一种选择是将每个类别映射到数据管道中的特定数字。例如,您可以将

0.0
分配给
"Subscriptions"
,将
1.0
分配给其他类别。

请注意,您使用的损失函数是用于二元分类的。如果您有两个以上的类别,则需要进行一些修改。我建议阅读本教程,尤其是多类分类部分:基本文本分类

© www.soinside.com 2019 - 2024. All rights reserved.