有人可以解释一下为什么AVG()函数在代码中给我加权平均值吗?
SELECT s.stud_id, s.country, SUM(e.paid) AS totalpaid
INTO #totalpaid
FROM oc.students AS s
JOIN oc.enrollment AS e ON s.stud_id = e.stud_id
GROUP BY s.country ,s.stud_id;
SELECT DISTINCT s.country, ROUND(AVG(t.totalpaid) OVER (PARTITION BY s.country),0) AS avg_country
FROM #totalpaid t
JOIN oc.students s ON t.stud_id = s.stud_id
JOIN oc.enrollment e ON e.stud_id = s.stud_id;
例如,在马耳他,学生12参加了1门课程并支付了45欧元,学生837参加了7门课程并支付了294欧元。我想对平均值进行简单的(45 + 294)/ 2计算,但是系统的计算方式类似于(1 * 45 + 7 * 294)/ 8。我究竟做错了什么?
因为您要两次参加餐桌。
通过将INSERT
和SELECT
语句放在一起,您的查询等同于:
SELECT
DISTINCT s.country,
ROUND(AVG(t.totalpaid) OVER (PARTITION BY s.country),0) AS avg_country
FROM (
SELECT s.stud_id, s.country, SUM(e.paid) AS totalpaid
FROM oc.students AS s
JOIN oc.enrollment AS e ON s.stud_id = e.stud_id
GROUP BY s.country ,s.stud_id
) t
JOIN oc.students s ON t.stud_id = s.stud_id
JOIN oc.enrollment e ON e.stud_id = s.stud_id
您可以清楚地看到表students
和enrollment
被连接了两次。这将产生偏斜的平均函数。
在第二个查询中,当您将临时表重新连接到enrollment
时,它将为每个类生成一行;这是totalpaid
列中多个值的来源。
第二个查询未使用临时表中尚未存在的任何列,因此您根本不需要那些联接。这应该会产生您想要的东西。
SELECT
t.country,
ROUND(AVG(t.totalpaid) OVER (PARTITION BY t.country),0) AS avg_country
FROM #totalpaid t
GROUP BY
t.country;