我正在使用Redshift数据库,我无法理解为什么我的join或SUM带来了太多的值。我的查询如下:
SELECT
date(u.created_at) AS date,
count(distinct c.user_id) AS active_users,
sum(distinct insights.spend) AS fbcosts,
count(c.transaction_amount) AS share_shake_costs,
round(((sum(distinct insights.spend) + count(c.transaction_amount)) /
count(distinct c.user_id)),2) AS cac
FROM
dbname.users AS u
LEFT JOIN
dbname.card_transaction AS c ON c.user_id = u.id
LEFT JOIN
facebookads.insights ON date(insights.date_start) = date(u.created_at)
LEFT JOIN
dbname.card_transaction AS c2 ON date(c2.timestamp) = date(u.created_at)
WHERE
c2.vendor_transaction_description ilike '%share%'
OR c2.vendor_transaction_description ilike '%shake to win%'
GROUP BY
date
ORDER BY
1 DESC;
此查询返回以下数据:
如果我们查看2017-02-08,我们可以看到共有1298个“share_shake_costs”。但是,如果我只在card_transaction表上运行相同的查询,我会得到以下正确的结果。
第二个表的查询如下所示:
SELECT
date(timestamp),
sum(transaction_amount)
FROM
dbname.card_transaction AS c2
WHERE
c2.vendor_transaction_description ilike '%share%'
OR c2.vendor_transaction_description ilike '%shake to win%'
GROUP BY
1
ORDER BY
1 DESC;
我觉得我的“fbcosts”专栏有类似的问题。我认为这与我的加入有关,因为SUM应该工作正常。
我是Redshift和SQL的新手,所以也许有更好的方法来完成整个查询。我有什么明显的遗失吗?
看来你有一个包含1:n映射的表,当你加入一个公共子句时,这个数字被计算n次。
让我们说一个表,orders
包含user_id
和总bill_amount以及另一个表,order_details
包含该user_id放置的子项的详细信息。
如果你做左连接,根据定义,orders.user_id
将连接n次到order_details.user_id
,其中
n = total number of rows in order_details table
并将执行n次聚合(总和,计数等)。
+------------------+ +----------------------+
| orders | | order_details |
+------------------+ +----------------------+
|amount user_id | | user_id items |
+------------------+ +----------------------+
| 1000 123 ---------> | 123 apple |
+ +----------------------+
+-------------> | 123 guava |
| +----------------------+
v-------------> | 123 mango |
+----------------------+
select sum(amount) from orders o left join order_details od
on o.user_id = od.user_id; // result: 3000
select count(amount) from orders o left join order_details od
on o.user_id = od.user_id; // result: 3
我希望现在很清楚大量计数的原因。
PS:另外,总是喜欢在()中包含OR条件。
WHERE
(c2.vendor_transaction_description ilike '%share%'
OR c2.vendor_transaction_description ilike '%shake to win%')