我有这张桌子(myt):
CREATE TABLE myt (
name VARCHAR(50),
food VARCHAR(50),
d1 INT
);
INSERT INTO myt (name, food, d1) VALUES
('john', 'pizza', 2010),
('john', 'pizza', 2011),
('john', 'cake', 2012),
('tim', 'apples', 2015),
('david', 'apples', 2020),
('david', 'apples', 2021),
('alex', 'cookies', 2005),
('alex', 'cookies', 2006);
name food d1 food_year
john pizza 2010 2010
john pizza 2011 2011
john cake 2012 2012
tim apples 2015 2015
david apples 2020 2020
david apples 2021 2021
alex cookies 2005 2005
alex cookies 2006 2006
我编写了以下查询来找出每个名称中每种食物的百分比细分:
WITH FoodCounts AS (
SELECT name,
food,
COUNT(*) as food_count
FROM myt
GROUP BY name, food
),
TotalCounts AS (
SELECT name,
COUNT(*) as total_count
FROM myt
GROUP BY name
)
SELECT fc.name,
fc.food,
(fc.food_count * 100.0) / tc.total_count as percentage
FROM FoodCounts fc
JOIN TotalCounts tc
ON fc.name = tc.name;
name food percentage
alex cookies 100.00000
david apples 100.00000
john cake 33.33333
john pizza 66.66667
tim apples 100.00000
我现在正在尝试修改此查询以找出累积百分比。例如,截至 2011 年,约翰的食物分解情况是怎样的?截至 2012 年,约翰的食物分解情况是多少?
我尝试使用窗口函数编写一系列 CTE 来回答这个问题:
WITH YearlyFoodCounts AS (
SELECT name,
food,
food_year,
COUNT(*) as food_count
FROM myt
GROUP BY name, food, food_year
),
CumulativeCounts AS (
SELECT name,
food_year,
SUM(food_count) OVER (PARTITION BY name ORDER BY food_year) as cumulative_count
FROM YearlyFoodCounts
)
SELECT yfc.name,
yfc.food,
yfc.food_year,
yfc.food_count,
cc.cumulative_count,
(yfc.food_count * 100.0) / cc.cumulative_count as percentage
FROM YearlyFoodCounts yfc
JOIN CumulativeCounts cc
ON yfc.name = cc.name AND yfc.food_year = cc.food_year
ORDER BY yfc.name, yfc.food_year;
结果似乎格式正确:
name food food_year food_count cumulative_count percentage
alex cookies 2005 1 1 100.00000
alex cookies 2006 1 2 50.00000
david apples 2020 1 1 100.00000
david apples 2021 1 2 50.00000
john pizza 2010 1 1 100.00000
john pizza 2011 1 2 50.00000
john cake 2012 1 3 33.33333
tim apples 2015 1 1 100.00000
这是解决这个问题的正确方法吗?
你把这件事搞得太复杂了。它不需要联接或子查询,您可以使用窗口函数在单个级别中完成它。您可以将普通聚合放入窗口函数中,因为窗口函数在正常聚合之后运行。
注:
ROWS UNBOUNDED PRECEDING
,因为默认值是 RANGE UNBOUNDED PRECEDING
,略有不同。food
只能有一个name, food_year
。因此,您应该仅按这两列进行聚合。SELECT
name,
MIN(food) AS food,
food_year,
COUNT(*) as food_count,
SUM(COUNT(*)) OVER (PARTITION BY name ORDER BY food_year ROWS UNBOUNDED PRECEDING) as cumulative_count,
COUNT(*) * 100.0 / SUM(COUNT(*)) OVER (PARTITION BY name ORDER BY food_year ROWS UNBOUNDED PRECEDING) as percentage
FROM myt
GROUP BY
name,
food_year;