我的数据在表中如下:
id Author_ID Research_Area Category_ID Paper_Count Paper_Year Rank
---------------------------------------------------------------------------------
1 677 feature extraction 8 1 2005 1
2 677 image annotation 11 1 2005 2
3 677 probabilistic model 12 1 2005 3
4 677 semantic 19 1 2007 1
5 677 feature extraction 8 1 2009 1
6 677 image annotation 11 1 2011 1
7 677 semantic 19 1 2012 1
8 677 video sequence 5 2 2013 1
9 1359 adversary model 1 2 2005 1
10 1359 ensemble method 14 2 2005 2
11 1359 image represent 11 2 2005 3
12 1359 adversary model 1 7 2006 1
13 1359 concurrency control 17 5 2006 2
14 1359 information system 12 2 2006 3
15 ...
16 ...
而我希望查询输出为:
id Author_ID Category_ID Paper_Count Category_Prob Paper_Year Rank
---------------------------------------------------------------------------------
1 677 8 1 0.333 2005 1
2 677 11 1 0.333 2005 2
3 677 12 1 0.333 2005 3
4 677 19 1 1.0 2007 1
5 677 8 1 1.0 2009 1
6 677 11 1 1.0 2011 1
7 677 19 1 1.0 2012 1
8 677 5 2 1.0 2013 1
9 1359 1 2 0.333 2005 1
10 1359 14 2 0.333 2005 2
11 1359 11 2 0.333 2005 3
12 1359 1 7 0.5 2006 1
13 1359 17 5 0.357 2006 2
14 1359 12 2 0.142 2006 3
15 ...
16 ...
而Category_Prob
是一个计算列,它分两步计算:
第一步,我们必须在每个SUM
中有一个Paper_Count
的Paper_Year
,例如Paper_Year = 2005
和Author_ID = 677
,SUM(Paper_Count) = 3
第二步,然后对于每个Category_ID
,我们必须将Paper_Count
与SUM(Paper_Count)
的值除以Paper_Year
,即1/3
,即0.333
等等......
而且,我试过这个查询:
SELECT
Author_ID, Abstract_Category, Paper_Count,
[Category_Prob] = Paper_Count / SUM(Paper_Count),
Paper_Year, Rank
FROM
Author_Areas
GROUP BY
Author_ID, Abstract_Category, Paper_Year, Paper_Count, Rank
ORDER BY
Author_ID, Paper_Year
但它只返回1
列中的Category_Prob
表中的所有行。
您的查询的问题是您不是通过Paper_Year
进行分组,而是通过Author_ID, Abstract_Category, Paper_Count, Rank
进行分组。因此SUM(Paper_Count)
等于每组的Paper_Count。
您可以使用SUM OVER
:
SELECT id, Author_ID, Abstract_Category [Category_ID],
Paper_Count,
Paper_Count * 1.0 / SUM(Paper_Count)
OVER (PARTITION BY Author_ID, Paper_Year) AS [Category_Prob],
Paper_Year, Rank
FROM Author_Areas
ORDER BY Author_ID, Paper_Year
注意:您必须乘以1.0
以避免整数除法。注2:如果您的实际要求是按作者分组,那么也许您必须在Author_ID
子句中添加PARTITION BY
字段。
我怀疑(请确认)所涉及的所有字段的数据类型都是integers
。当你用int
计算时,返回类型也是int
。在计算之前你应该把convert
的字段改为decimal
。
SELECT Author_ID, Abstract_Category, Paper_Count,
[Category_Prob] = convert(decimal(10,3), Paper_Count) / convert(decimal(10, 3), SUM(Paper_Count)),
Paper_Year, Rank
FROM Author_Areas
GROUP BY Author_ID, Abstract_Category, Paper_Year, Paper_Count, Rank
ORDER BY Author_ID, Paper_Year