Average计算SQL Server中的加权平均值

问题描述 投票:0回答:2

有人可以解释一下为什么AVG()函数在代码中给我加权平均值吗?

SELECT s.stud_id, s.country, SUM(e.paid) AS totalpaid
    INTO #totalpaid 
    FROM oc.students AS s
    JOIN oc.enrollment AS e ON s.stud_id = e.stud_id
GROUP BY s.country ,s.stud_id;

SELECT DISTINCT s.country, ROUND(AVG(t.totalpaid) OVER (PARTITION BY s.country),0) AS avg_country
    FROM #totalpaid t
    JOIN oc.students s ON t.stud_id = s.stud_id
    JOIN oc.enrollment e ON e.stud_id = s.stud_id; 

例如,在马耳他,学生12参加了1门课程并支付了45欧元,学生837参加了7门课程并支付了294欧元。我想对平均值进行简单的(45 + 294)/ 2计算,但是系统的计算方式类似于(1 * 45 + 7 * 294)/ 8。我究竟做错了什么?enter image description here

sql sql-server tsql
2个回答
1
投票

因为您要两次参加餐桌。

通过将INSERTSELECT语句放在一起,您的查询等同于:

SELECT
  DISTINCT s.country, 
  ROUND(AVG(t.totalpaid) OVER (PARTITION BY s.country),0) AS avg_country
FROM (
  SELECT s.stud_id, s.country, SUM(e.paid) AS totalpaid
  FROM oc.students AS s
  JOIN oc.enrollment AS e ON s.stud_id = e.stud_id
  GROUP BY s.country ,s.stud_id    
) t
JOIN oc.students s ON t.stud_id = s.stud_id
JOIN oc.enrollment e ON e.stud_id = s.stud_id

您可以清楚地看到表studentsenrollment被连接了两次。这将产生偏斜的平均函数。


1
投票

在第二个查询中,当您将临时表重新连接到enrollment时,它将为每个类生成一行;这是totalpaid列中多个值的来源。

第二个查询未使用临时表中尚未存在的任何列,因此您根本不需要那些联接。这应该会产生您想要的东西。

SELECT 
  t.country, 
  ROUND(AVG(t.totalpaid) OVER (PARTITION BY t.country),0) AS avg_country
FROM #totalpaid t
GROUP BY 
  t.country;
© www.soinside.com 2019 - 2024. All rights reserved.