您正在尝试访问一个列,但多个列都具有该名称

问题描述 投票:0回答:1

我正在尝试连接 2 个数据框,使它们都具有以下命名列。进行 LEFT OUTER 连接的最佳方法是什么?

df = df.join(df_forecast, ["D_ACCOUNTS_ID", "D_APPS_ID", "D_CONTENT_PAGE_ID"], 'left')

目前,我得到一个错误:

You're trying to access a column, but multiple columns have that name.

我错过了什么?

python pyspark left-join outer-join foundry-code-repositories
1个回答
0
投票

让我知道你对此的看法:

import pyspark.sql.functions as f

join_keys = ["D_ACCOUNTS_ID", "D_APPS_ID", "D_CONTENT_PAGE_ID"]

df = (
    df
    .join(df_forecast, join_keys, 'left')
    .select(
        *join_keys,
        *[f.col(df[element]).alias('df_'+element) for element in df.columns if element not in join_keys],
        *[f.col(df_forecast[element]).alias('df_forecast_'+element) for element in df_forecast.columns if element not in join_keys]
    )
)
© www.soinside.com 2019 - 2024. All rights reserved.