我想将熊猫数据框中的值插入/更新到postgres表中。我在postgres表中有一个唯一的元组(a,b)。如果元组已经存在,我只想更新第三个值c,如果元组不存在,我想创建一个三元组(a,b,c)。
最有效的方法是什么?我猜是某种形式的批量插入,但是我不太确定该插入的确切程度。
您可以将数据框转换为CTE https://www.postgresql.org/docs/current/queries-with.html,然后将CTE中的数据插入表中。像这样:
def convert_df_to_cte(df):
vals = ', \n'.join([f"{tuple([f'$str${e}$str$' for e in row])}" for row in df.values])
vals = vals.replace("'$str$", "$str$")
vals = vals.replace("$str$'", "$str$")
vals = vals.replace('"$str$', "$str$")
vals = vals.replace('$str$"', "$str$")
vals = vals.replace('$str$nan$str$', 'NULL')
columns = ', \n'.join(df.columns)
sql = f"""
WITH vals AS (
SELECT
{columns}
FROM
(VALUES {vals}) AS t ({columns})
)
"""
return sql
df = pd.DataFrame([[1, 2, 3]], columns=['col_1', 'col_2', 'col_3'])
cte_sql = convert_df_to_cte(df)
sql_to_insert = f"""
{cte_sql}
INSERT INTO schema.table (col_1, col_2, col_3)
SELECT
col_1::integer, -- don't forget to cast to right type to avoid errors
col_2::integer, -- don't forget to cast to right type to avoid errors
col_3::character varying
FROM
vals
ON CONFLICT (col_1, col_2) DO UPDATE SET
col_3 = excluded.col_3;
"""
run_sql(sql)