我目前有一个 C# 服务,它使用 dapper 调用一个存储过程,该存储过程执行 2 件事:如果客户存在,它会获取客户
GUID
并将其添加到 CustomerInformations
表中;如果客户不存在,则插入客户,然后返回 GUID
并将其添加到 CustomerInformations
表中。
以前,插入每小时大约需要 175 万条记录。现在每小时只能勉强获取20万条记录。我的
CustomerInformations
表中有大约 7500 万条记录,我正在寻求解决瓶颈。
对于每个 Customer 属性,它都会迭代调用存储过程。每个存储过程调用可以有 2 次插入到数据库中。首先,将客户添加到
Customers
表中,然后将属性添加到 CustomerInformations
表中。我知道这可能不是存储数据的最理想方式,但这不是我可以改变的。
C# 服务
foreach (var info request.Data)
{
string sql = "add_one_by_customer";
object parameters = new
{
p_customer_first_name = info.FirstName,
p_customer_last_name = info.LastName,
p_customer_property_name = info.PropertyName,
p_customer_property_value = info.PropertyValue
};
try
{
await db.ExecuteAsync(sql, parameters, transaction: transaction, commandType: CommandType.StoredProcedure);
}
catch (Exception e)
{
throw new Exception($"Failed to insert");
}
}
Postgres 存储过程:
CREATE OR REPLACE PROCEDURE add_one_by_customer(
p_customer_first_name VARCHAR,
p_customer_last_name VARCHAR,
p_customer_property_name VARCHAR,
p_customer_property_value VARCHAR,
)
LANGUAGE plpgsql
AS $procedure$
DECLARE p_customer_id uuid;
p_current_item_value varchar;
begin
SELECT INTO p_customer_id,
customer_id
FROM customers
WHERE customer_first_name = p_customer_first_name AND
customer_last_name = p_customer_last_name
limit 1;
IF (p_customer_id IS NULL) THEN
begin
INSERT INTO customers(customer_first_name, customer_last_name)
VALUES (p_customer_first_name, p_customer_last_name) RETURNING customer_id into p_customer_id;
EXCEPTION WHEN unique_violation THEN
p_customer_id = (SELECT custmomer_id
FROM customers
WHERE customer_first_name = p_customer_first_name AND
customer_last_name = p_customer_last_name
END;
end if;
p_current_item_value := (select property_value
from customer_informations
where customer_id = p_customer_id AND
customer_property_name = p_customer_property_name);
if (p_current_item_value is NULL) THEN
INSERT INTO customer_informations(customer_id, customer_property_name, customer_property_value)
VALUES (p_customer_id, p_customer_property_name, p_customer_property_value);
elseif (p_current_item_value is not null AND p_current_item_value != p_item_value) then
UPDATE customer_informations
SET customer_property_value = p_current_item_value
WHERE customer_id = p_customer_id ;
end if;
end; $procedure$;
目前我的
CustomerInformations
表对 Customer_Id, Customer_property_name
有唯一的约束。
我尝试增强的东西:
任何提示或建议将不胜感激。
客户信息唯一约束:
CONSTRAINT ux_customer_informations UNIQUE (customer_id, customer_property_name)
客户独特的约束:
CONSTRAINT ux_customers UNIQUE (customer_firstname, customer_lastname)
您当前的程序是效率极低。参见:
避免带有错误处理的嵌套代码块,这是非常昂贵的。可以通过我使用的“SELECT 或 INSERT”技术正确完成。参见:
第二部分是变相的UPSERT。现在也便宜很多了:
CREATE OR REPLACE PROCEDURE dd_one_by_customer(
p_customer_first_name text
, p_customer_last_name text
, p_customer_property_name text
, p_customer_property_value text
)
LANGUAGE plpgsql AS
$proc$
DECLARE
p_customer_id uuid;
p_current_item_value text;
BEGIN
LOOP
SELECT customer_id
FROM customers
WHERE customer_first_name = p_customer_first_name
AND customer_last_name = p_customer_last_name
INTO p_customer_id;
EXIT WHEN FOUND;
INSERT INTO customers
( customer_first_name, customer_last_name)
VALUES (p_customer_first_name, p_customer_last_name)
ON CONFLICT (customer_first_name, customer_last_name) DO NOTHING
RETURNING customer_id
INTO p_customer_id;
EXIT WHEN FOUND;
END LOOP;
INSERT INTO customer_informations
( customer_id, customer_property_name, customer_property_value)
VALUES (p_customer_id, p_customer_property_name, p_customer_property_value)
ON CONFLICT (customer_id, customer_property_name) DO UPDATE
SET customer_property_value = EXCLUDED.customer_property_value
WHERE customer_property_value IS DISTINCT FROM p_current_item_value;
END
$proc$;
这需要对两个表分别施加
UNIQUE
约束 - 正是您声明的表(ux_customer_informations
和 ux_customers
)。参见:
如果
customer_property_value
和 p_current_item_value
都不能是 null
,则将最终的 WHERE 子句简化为:
...
WHERE customer_property_value <> p_current_item_value;