SQL Server中的计算列

问题描述 投票:9回答:2

我的数据在表中如下:

id  Author_ID   Research_Area       Category_ID  Paper_Count   Paper_Year   Rank  
---------------------------------------------------------------------------------
1   677         feature extraction  8            1             2005         1
2   677         image annotation    11           1             2005         2
3   677         probabilistic model 12           1             2005         3
4   677         semantic            19           1             2007         1
5   677         feature extraction  8            1             2009         1
6   677         image annotation    11           1             2011         1  
7   677         semantic            19           1             2012         1  
8   677         video sequence      5            2             2013         1  
9   1359        adversary model     1            2             2005         1
10  1359        ensemble method     14           2             2005         2
11  1359        image represent     11           2             2005         3
12  1359        adversary model     1            7             2006         1
13  1359        concurrency control 17           5             2006         2
14  1359        information system  12           2             2006         3  
15  ...         
16  ...  

而我希望查询输出为:

id  Author_ID   Category_ID  Paper_Count   Category_Prob   Paper_Year   Rank  
---------------------------------------------------------------------------------
1   677         8            1             0.333           2005         1
2   677         11           1             0.333           2005         2
3   677         12           1             0.333           2005         3
4   677         19           1             1.0             2007         1
5   677         8            1             1.0             2009         1
6   677         11           1             1.0             2011         1  
7   677         19           1             1.0             2012         1  
8   677         5            2             1.0             2013         1  
9   1359        1            2             0.333           2005         1
10  1359        14           2             0.333           2005         2
11  1359        11           2             0.333           2005         3
12  1359        1            7             0.5             2006         1
13  1359        17           5             0.357           2006         2
14  1359        12           2             0.142           2006         3  
15  ...         
16  ...  

Category_Prob是一个计算列,它分两步计算:

第一步,我们必须在每个SUM中有一个Paper_CountPaper_Year,例如Paper_Year = 2005Author_ID = 677SUM(Paper_Count) = 3

第二步,然后对于每个Category_ID,我们必须将Paper_CountSUM(Paper_Count)的值除以Paper_Year,即1/3,即0.333等等......

而且,我试过这个查询:

SELECT 
    Author_ID, Abstract_Category, Paper_Count,
    [Category_Prob] = Paper_Count / SUM(Paper_Count),
    Paper_Year, Rank
FROM 
    Author_Areas
GROUP BY 
    Author_ID, Abstract_Category, Paper_Year, Paper_Count, Rank
ORDER BY 
    Author_ID, Paper_Year

但它只返回1列中的Category_Prob表中的所有行。

sql-server calculated-columns
2个回答
6
投票

您的查询的问题是您不是通过Paper_Year进行分组,而是通过Author_ID, Abstract_Category, Paper_Count, Rank进行分组。因此SUM(Paper_Count)等于每组的Paper_Count。

您可以使用SUM OVER

SELECT      id, Author_ID, Abstract_Category [Category_ID],  
            Paper_Count, 
            Paper_Count * 1.0 / SUM(Paper_Count)  
            OVER (PARTITION BY Author_ID, Paper_Year) AS [Category_Prob],
            Paper_Year, Rank
FROM        Author_Areas
ORDER BY    Author_ID, Paper_Year

注意:您必须乘以1.0以避免整数除法。注2:如果您的实际要求是按作者分组,那么也许您必须在Author_ID子句中添加PARTITION BY字段。


0
投票

我怀疑(请确认)所涉及的所有字段的数据类型都是integers。当你用int计算时,返回类型也是int。在计算之前你应该把convert的字段改为decimal

SELECT Author_ID, Abstract_Category, Paper_Count,
[Category_Prob] = convert(decimal(10,3), Paper_Count) / convert(decimal(10, 3), SUM(Paper_Count)),
Paper_Year, Rank
FROM Author_Areas
GROUP BY Author_ID, Abstract_Category, Paper_Year, Paper_Count, Rank
ORDER BY Author_ID, Paper_Year
© www.soinside.com 2019 - 2024. All rights reserved.