当前索引名称必须完全匹配

Question

我正在尝试在实体集中添加考拉数据框。这是它的代码

subset_kdf_fp_eta_gt_prd.spark.print_schema()
root
 |-- booking_code: string (nullable = true)
 |-- order_id: string (nullable = true)
 |-- restaurant_id: string (nullable = true)
 |-- country_id: long (nullable = true)
 |-- inferred_prep_time: long (nullable = true)
 |-- inferred_wait_time: long (nullable = true)
 |-- is_integrated_model: integer (nullable = true)
 |-- sub_total: double (nullable = true)
 |-- total_quantity: integer (nullable = true)
 |-- dish_name: string (nullable = true)
 |-- sub_total_in_sgd: double (nullable = true)
 |-- city_id: long (nullable = true)
 |-- hour: integer (nullable = true)
 |-- weekday: integer (nullable = true)
 |-- request_time_epoch_utc: timestamp (nullable = true)
 |-- year: string (nullable = true)
 |-- month: string (nullable = true)
 |-- day: string (nullable = true)
 |-- is_takeaway: string (nullable = false)
 |-- is_scheduled: string (nullable = false)

es = ft.EntitySet(id="koalas_es")
from woodwork.logical_types import Categorical, Double, Integer, NaturalLanguage, Datetime, Boolean

es.add_dataframe(dataframe_name="fp_eta_gt_prd",
                              dataframe=subset_kdf_fp_eta_gt_prd,
                              index="order_id",
                              time_index="request_time_epoch_utc",
                              already_sorted="false",
                              logical_types={
                                  "booking_code": Categorical,
                                  "order_id": Categorical,
                                  "restaurant_id": Categorical,
                                  "country_id": Double,
                                  "inferred_prep_time": Double,
                                  "inferred_wait_time": Double,
                                  "is_integrated_model": Categorical,
                                  "sub_total": Double,
                                  "total_quantity": Integer,
                                  "dish_name": NaturalLanguage,
                                  "sub_total_in_sgd": Double,
                                  "city_id": Categorical,
                                  "hour": Categorical,
                                  "weekday": Categorical,
                                  "request_time_epoch_utc": Datetime,
                                  "year": Categorical,
                                  "month": Categorical,
                                  "day": Categorical,
                                  "is_takeaway": Categorical,
                                  "is_scheduled": Categorical,
                              })

运行此程序时，我遇到错误当前索引名称必须完全匹配。我已经仔细检查了所有字段名称、索引唯一性等。不确定这里的错误原因是什么。

Answer 1

我在尝试向 pyspark.sql.dataframe.DataFrame 添加列时遇到了类似的情况：

df['new_column'] = df.pandas_api().apply(somefunc)

这产生了 ValueError：索引名称当前必须完全匹配。

为了诊断，我查看了原始数据帧上的索引以及应用返回的结果：

result = df.pandas_api().apply(somefunc)
print(df.index)
print(result.index)

输出是：

Index(['A', 'B', 'C', 'D', 'E', 'F'], dtype='object', name='my_index')
Index(['A', 'B', 'C', 'D', 'E', 'F'], dtype='object')

注意结果索引上的name完全不存在。 ValueError 需要按字面解释 - 每个索引上的名称必须完全匹配。

解决问题的代码：

result = df.pandas_api().apply(somefunc)
result.index.name = 'my_index'
df['new_column'] = result

没有值错误！

当前索引名称必须完全匹配

问题描述投票：0回答：1

1个回答

最新问题

当前索引名称必须完全匹配

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1