我有一个特殊的欺骗问题。我可以很容易地识别记录,但我需要做一些基本上是对一些伴随数据的合并。
这是问题所在。我的表有点像这样:
CREATE TABLE `People` (
`PersonId` int(11) NOT NULL AUTO_INCREMENT,
`Address` varchar(255) DEFAULT NULL,
`Title` varchar(50) DEFAULT NULL,
`Forename` varchar(150) DEFAULT NULL,
`Surname` varchar(150) DEFAULT NULL,
`FlagOne` bit(1) NOT NULL DEFAULT b'0',
`FlagTwo` bit(1) NOT NULL DEFAULT b'0',
`FlagThree` bit(1) NOT NULL DEFAULT b'0',
PRIMARY KEY (`PersonId`)
)
重复记录仅在标题和标记值上有所不同 - 它们通过具有相同的地址,名字和姓氏字段来标识为重复记录:
PersonId Address Title Forename Surname FlagOne FlagTwo FlagThree
1 6 Smith Street Mrs Jane Doe 1 0 0
2 6 Smith Street Ms Jane Doe 0 1 0
我无法弄清楚如何将这两个合并为一个记录,保留所有积极的旗帜。保留两个原始记录中的哪一个并不重要 - 使用PersonId来区分它们是好的。所以,这样的事情是理想的结果:
PersonId Address Title Forename Surname FlagOne FlagTwo FlagThree
2 6 Smith Street Ms Jane Doe 1 1 0
我知道如何根据连接进行更新,但我不确定如何实现获得此特定结果所需的条件?
你说你知道如何使用JOIN更新,所以这样的东西给出了合并:
SELECT MAX(PersonId),
Address,
MAX(Title),
Forename,
Surname
MAX( FlagOne ),
MAX( FlagTwo ),
MAX( FlagThree)
FROM People
GROUP BY Address,
Forename,
Surname
然后你需要删除重复项
DELETE People
WHERE PersonID IN (SELECT MIN(PersonId)
FROM People
GROUP BY Address,
Forename,
Surname
HAVING COUNT(*) > 1)
如果您有三个具有相同Forename,Surname的行,则假设仅重复。需要一个不同的方法。
我认为你需要分两步完成:
1-更新值:
Update People p
LEFT JOIN (
SELECT MAX(PersonId) as PId,
Address,
MAX(Title) as title,
Forename,
Surname
MAX( FlagOne ) as FlagOne,
MAX( FlagTwo ) as FlagTwo,
MAX( FlagThree) as Flagthree
FROM People
GROUP BY Address,
Forename,
Surname ) t
ON t.Address = p.Address
AND t.Forename = p.Forename
AND t.Surname = p.Surname
SET p.FlagOne = t.FlagOne ,
p.FlagTwo = t.FlagTwo ,
p.FlagThree= t.FlagThree
WHERE p.PersonId = t.PId
2-删除:
DELETE p
FROM People p
INNER JOIN
People t ON t.Address = p.Address
AND t.Forename = p.Forename
AND t.Surname = p.Surname
WHERE p.PersonId < t.PersonId