我需要编写一个 SQL 脚本,根据以下规则对每个客户端的事务进行分组
每个组的交易限额为 500000 笔或 365 天,以先到者为准
我有三栏
transaction_id
、transaction_date
和client_id
结果必须显示
group_id
、client_id
、number_of_transaction_in_group
、group_start_date
和 group_end_date
我尝试了 ChatGPT 中的这个脚本,但它没有返回正确的结果,如下所示:
客户端ID | 组ID | 群组交易 | 群组开始日期 | 组结束日期 |
---|---|---|---|---|
8 | 0 | 33101 | 2022-08-14 | 2023-08-13 |
8 | 1 | 966899 | 2023-08-14 | 2024-05-07 |
8 | 2 | 500000 | 2024-05-07 | 2024-08-12 |
8 | 3 | 417142 | 2024-08-12 | 2024-11-27 |
预期的结果应该是这样的
客户端ID | 组ID | 群组交易 | 群组开始日期 | 组结束日期 |
---|---|---|---|---|
8 | 0 | 33101 | 2022-08-14 | 2023-08-13 |
8 | 1 | 500000 | 2023-08-14 | 2024-05-07 |
8 | 2 | 500000 | 2024-05-07 | 2024-08-12 |
8 | 3 | 417142 | 2024-08-12 | 2025-08-12 |
8 | 4 | 300000 | 2025-08-13 | 2026-08-12 |
代码:
WITH NumberedTransactions AS
(
-- Assign a row number for each transaction per client, ordered by date
SELECT
ClientId,
TransactionId,
TransactionDate,
ROW_NUMBER() OVER (PARTITION BY ClientId ORDER BY TransactionDate) AS RowNum
FROM
Transactions
),
GroupsByTransactionCount AS
(
-- Group transactions into sets of 5 based on RowNum
SELECT
ClientId,
TransactionId,
TransactionDate,
(RowNum - 1) / 5 AS TransactionGroup
FROM
NumberedTransactions
),
GroupsByDate AS
(
-- Assign a start date for each 365-day window for each client
SELECT
ClientId,
TransactionId,
TransactionDate,
DATEDIFF(DAY, MIN(TransactionDate) OVER (PARTITION BY ClientId), TransactionDate) / 365 AS DateGroup
FROM
NumberedTransactions
),
FinalGroups AS
(
-- Combine both grouping methods into one
SELECT
ClientId,
TransactionId,
TransactionDate,
TransactionGroup,
DateGroup,
-- Use the larger group number to ensure both conditions are met
CASE
WHEN TransactionGroup >= DateGroup
THEN TransactionGroup
ELSE DateGroup
END AS FinalGroup
FROM
GroupsByTransactionCount
INNER JOIN
GroupsByDate ON GroupsByTransactionCount.ClientId = GroupsByDate.ClientId
AND GroupsByTransactionCount.TransactionId = GroupsByDate.TransactionId
)
SELECT
ClientId,
FinalGroup,
COUNT(*) AS TransactionsInGroup,
MIN(TransactionDate) AS GroupStartDate,
MAX(TransactionDate) AS GroupEndDate
FROM
FinalGroups
GROUP BY
ClientId, FinalGroup
ORDER BY
ClientId, FinalGroup;
这里尝试演示 2 个用例,两个客户端都有 6 个事务
下面的行数限制为 5(而不是 500,000,但在针对实际数据使用时可以更改该限制)。请注意,RowNum 按日期降序排列,以便保留“最新”行。
CREATE TABLE Transactions (
transaction_id INT,
transaction_date Date,
client_id VARCHAR(512)
);
INSERT INTO Transactions (transaction_id, transaction_date, client_id) VALUES
-- only 3 rows within the 365 days
('1', '2022-12-12', '1'), -- too old
('2', '2023-02-02', '1'), -- too old
('3', '2023-04-04', '1'), -- too old
('4', '2023-10-13', '1'),
('5', '2023-11-11', '1'),
('6', '2024-10-12', '1'), -- most recent
-- all rows within 365 days, but limit to 5 rows
('7', '2024-01-12', '2'),
('8', '2024-02-02', '2'),
('9', '2024-04-04', '2'),
('10', '2024-07-12', '2'),
('11', '2024-09-01', '2'),
('12', '2024-10-12', '2'); -- most recent
SELECT
client_id
, dateadd(day,-365,Max(transaction_date)) as min_date
, Max(transaction_date) as most_recent_date
, count(*) as rows_in_group
FROM Transactions
GROUP BY client_id
client_id | 最小日期 | 最近日期 | 组中的行数 |
---|---|---|---|
1 | 2023-10-13 | 2024-10-12 | 6 |
2 | 2023-10-13 | 2024-10-12 | 6 |
select
d.*
from (
select
t.client_id, t.transaction_date, g.min_date
, ROW_NUMBER() OVER (PARTITION BY t.Client_Id ORDER BY t.Transaction_Date DESC) AS RowNum
from Transactions as t
inner join (
SELECT
client_id
, dateadd(day,-365,Max(transaction_date)) as min_date
FROM Transactions
GROUP BY client_id
) as g on t.client_id = g.client_id
) as d
where d.RowNum <= 5 and transaction_date >= min_date
client_id | 交易日期 | 最小日期 | 行数 |
---|---|---|---|
1 | 2024-10-12 | 2023-10-13 | 1 |
1 | 2023-11-11 | 2023-10-13 | 2 |
1 | 2023-10-13 | 2023-10-13 | 3 |
2 | 2024-10-12 | 2023-10-13 | 1 |
2 | 2024-09-01 | 2023-10-13 | 2 |
2 | 2024-07-12 | 2023-10-13 | 3 |
2 | 2024-04-04 | 2023-10-13 | 4 |
2 | 2024-02-02 | 2023-10-13 | 5 |