使用Redshift数据库时,SQL Join或SUM返回的值太多

问题描述 投票:0回答:1

我正在使用Redshift数据库,我无法理解为什么我的join或SUM带来了太多的值。我的查询如下:

SELECT 
    date(u.created_at) AS date,
    count(distinct c.user_id) AS active_users,
    sum(distinct insights.spend) AS fbcosts,
    count(c.transaction_amount) AS share_shake_costs,
    round(((sum(distinct insights.spend) + count(c.transaction_amount)) / 
    count(distinct c.user_id)),2) AS cac
FROM 
    dbname.users AS u
LEFT JOIN
    dbname.card_transaction AS c ON c.user_id = u.id
LEFT JOIN
    facebookads.insights ON date(insights.date_start) = date(u.created_at)
LEFT JOIN
    dbname.card_transaction AS c2 ON date(c2.timestamp) = date(u.created_at)
WHERE 
    c2.vendor_transaction_description ilike '%share%'
    OR c2.vendor_transaction_description ilike '%shake to win%'
GROUP BY 
    date
ORDER BY 
    1 DESC;

此查询返回以下数据:

Table data from query above

如果我们查看2017-02-08,我们可以看到共有1298个“share_shake_costs”。但是,如果我只在card_transaction表上运行相同的查询,我会得到以下正确的结果。

enter image description here

第二个表的查询如下所示:

SELECT 
    date(timestamp),
    sum(transaction_amount)
FROM 
    dbname.card_transaction AS c2
WHERE 
    c2.vendor_transaction_description ilike '%share%'
    OR c2.vendor_transaction_description ilike '%shake to win%'
GROUP BY 
    1
ORDER BY 
    1 DESC;

我觉得我的“fbcosts”专栏有类似的问题。我认为这与我的加入有关,因为SUM应该工作正常。

我是Redshift和SQL的新手,所以也许有更好的方法来完成整个查询。我有什么明显的遗失吗?

sql join amazon-redshift
1个回答
0
投票

看来你有一个包含1:n映射的表,当你加入一个公共子句时,这个数字被计算n次。

让我们说一个表,orders包含user_id和总bill_amount以及另一个表,order_details包含该user_id放置的子项的详细信息。

如果你做左连接,根据定义,orders.user_id将连接n次到order_details.user_id,其中

n = total number of rows in order_details table

并将执行n次聚合(总和,计数等)。

+------------------+          +----------------------+
|      orders      |          |    order_details     |
+------------------+          +----------------------+
|amount    user_id |          | user_id       items  |
+------------------+          +----------------------+
| 1000       123   ---------> |   123         apple  |
              +               +----------------------+
              +-------------> |   123         guava  |
              |               +----------------------+
              v-------------> |   123         mango  |
                              +----------------------+

select sum(amount) from orders o left join order_details od 
on o.user_id = od.user_id; // result: 3000

select count(amount) from orders o left join order_details od 
on o.user_id = od.user_id; // result: 3

我希望现在很清楚大量计数的原因。

PS:另外,总是喜欢在()中包含OR条件。

WHERE 
    (c2.vendor_transaction_description ilike '%share%'
    OR c2.vendor_transaction_description ilike '%shake to win%')
© www.soinside.com 2019 - 2024. All rights reserved.