为什么SQL在需要外连接的内部连接中进行

问题描述 投票:0回答:1

我有两个表,我想使用SQL“外部”连接(然后获取)。确切的SQL查询(有问题)是:

SELECT
    LEFT(a.cusip, 6) AS cusip6, 
    a.date, a.prc, a.ret, a.vol, a.spread, a.shrout,
    b.epsf12, (b.seqq-b.pstkq) / b.cshoq AS bps
FROM
    crsp.msf a 
FULL JOIN 
    compa.fundq b ON (LEFT(a.cusip, 6) = LEFT(b.cusip, 6) 
                  AND a.date = b.datadate)
WHERE 
    (b.datadate BETWEEN '2010-01-01' and '2015-12-31') 
    AND (a.date BETWEEN '2010-01-01' and '2015-12-31') 
    AND (b.cshoq > 0)

这将返回670'293行。

但是当我分别获取两个数据集并且(外部)通过R-merge()连接它们时,我得到1'182'093行。我使用的两个单独的查询是:

SELECT  
    LEFT(cusip, 6) AS cusip6, date, prc, ret, vol, spread, shrout 
FROM
    crsp.msf 
WHERE 
    date BETWEEN '2010-01-01' and '2015-12-31'

SELECT 
    LEFT(cusip, 6) AS cusip6, datadate AS date, epsf12, 
    (seqq-pstkq)/cshoq AS bps 
FROM
    compa.fundq 
WHERE 
    datadate BETWEEN '2010-01-01' and '2015-12-31' 
    AND cshoq > 0

然后我合并(外部联接)使用:

merge(x = data_1, y = data_2, by.x = c("cusip6", "date"), by.y = c("cusip6", "date"), all = T)

这将返回1'182'093行,这是正确的。因此,当我明确指定外部联接时,我的原始(第一个)SQL查询实际上正在执行“内部联接”。下面的R-merge()返回670'293行,重新验证从SQL获取的数据确实是内连接。

merge(x = data_1, y = data_2, by.x = c("cusip6", "date"), by.y = c("cusip6", "date"))

我的SQL查询出错了什么?

sql join merge
1个回答
1
投票

因为WHERE子句是在JOIN之后应用的。此时存在NULL值(由于'失败'JOIN),这些行无法通过WHERE子句。

如果需要OUTER JOIN和过滤器,请将过滤器放在JOIN或子查询中。

SELECT
    LEFT(a.cusip, 6) AS cusip6, 
    a.date, a.prc, a.ret, a.vol, a.spread, a.shrout,
    b.epsf12, (b.seqq-b.pstkq) / b.cshoq AS bps
FROM
    (SELECT * FROM crsp.msf WHERE date BETWEEN '2010-01-01' and '2015-12-31') a
FULL JOIN 
    (SELECT * FROM compa.fundq WHERE datadate BETWEEN '2010-01-01' and '2015-12-31' AND cshoq > 0) b
        ON  LEFT(a.cusip, 6) = LEFT(b.cusip, 6) 
        AND a.date = b.datadate
© www.soinside.com 2019 - 2024. All rights reserved.