我正在尝试查找特定数据集中生成的重复项
我原来的查询是
SELECT
b.admidate, a.token_person_id,
COUNT (DISTINCT(a.token_person_id)) AS patientCount
FROM
new_alzheimers_agegroup2 a
LEFT JOIN
inpatient_mapping b ON b.token_person_id = a.token_person_id
AND a.indexdate = b.disdate
ORDER BY
a.token_person_id
但与原始数据相比,我发现有 1500 个重复项。如何构建可以查找重复项的查询?
要识别数据集中的重复项,您可以编写一个查询,按您怀疑导致重复项的列进行分组,然后对出现次数进行计数。如果出现次数大于 1,则有重复项。
SELECT
b.admidate,
a.token_person_id,
COUNT(*) AS occurrenceCount
FROM
new_alzheimers_agegroup2 a
LEFT JOIN
inpatient_mapping b ON b.token_person_id = a.token_person_id
AND a.indexdate = b.disdate
GROUP BY
b.admidate,
a.token_person_id
HAVING
COUNT(*) > 1
ORDER BY
occurrenceCount DESC;