如何正确使用平均值和分区?

问题描述 投票:0回答:1

我有一个数据包含user_idsvisitStartTimeproduct prices,已被用户查看。我尝试获取每个用户访问的平均价格和最高价格,但我的查询不对分区进行计算(user + visitStartTime),它仅通过user_id分区进行计算。

这是我的查询:

select distinct fullVisitorId ,visitStartTime,
    avg(pr) over (partition by visitStartTime,fullVisitorId) as avgPrice,
    max(pr) over (partition by fullVisitorId,visitStartTime) as maxPrice
from dataset

这是我得到的:

+-----+----------------------+-----------------+----------+----------+--+
| Row |    fullVisitorId     |    visitStartTi | avgPrice | maxPrice |  |
+-----+----------------------+-----------------+----------+----------+--+
|   1 |    64217461724617261 |      1538478049 |    484.5 |    969.0 |  |
|   2 |    64217461724617261 |      1538424725 |    484.5 |    969.0 |  |
+-----+----------------------+-----------------+----------+----------+--+

我的查询中缺少什么?

样本数据

+---------------+----------------+---------------+
| FullVisitorId | VisitStartTime | ProductPrice  |
+---------------+----------------+---------------+
|           123 |       72631241 |           100 |
|           123 |       72631241 |           250 |
|           123 |       72631241 |            10 |
|           123 |       73827882 |            70 |
|           123 |       73827882 |            90 |
+---------------+----------------+---------------+

期望的结果:

+-----+---------------+--------------+----------+----------+
| Row | fullVisitorId | visitStartTi | avgPrice | maxPrice |
+-----+---------------+--------------+----------+----------+
|   1 |           123 |     72631241 |    120.0 |    250.0 |
|   2 |           123 |     73827882 |     80.0 |     90.0 |
+-----+---------------+--------------+----------+----------+
sql google-bigquery
1个回答
2
投票

在这种情况下,您不需要“分区依据”。

试试这个:

select fullVisitorId ,visitStartTime, avg(ProductPrice) avgPrice ,max(ProductPrice) maxPrice
from sample
group by FullVisitorId,VisitStartTime;

(查询非常标准,所以我认为你可以在BigQuery中使用它)

这是使用PostgreSQL的输出:DB<>FIDDLE

更新

也适用于BigQuery Standard SQL:

#standardSQL
SELECT 
  FullVisitorId, 
  VisitStartTime, 
  AVG(ProductPrice) as avgPrice,
  MAX(ProductPrice) as maxPrice
FROM `project.dataset.table`
GROUP BY FullVisitorId, VisitStartTime 

如果你想测试它:

#standardSQL
WITH `project.dataset.table` AS (
  SELECT 123 FullVisitorId, 72631241 VisitStartTime, 100 ProductPrice 
  UNION ALL SELECT 123, 72631241, 250
  UNION ALL SELECT 123, 72631241, 10
  UNION ALL SELECT 123, 73827882, 70
  UNION ALL SELECT 123, 73827882, 90
)

SELECT 
  FullVisitorId, 
  VisitStartTime, 
  AVG(ProductPrice) as avgPrice,
  MAX(ProductPrice) as maxPrice
FROM `project.dataset.table`
GROUP BY FullVisitorId, VisitStartTime  
© www.soinside.com 2019 - 2024. All rights reserved.