我的 Python 列表如下:
x1 = ['lock-service',
'jenkins-service',
'xyz-reporting-service',
'ansible-service',
'harbor-service',
'version-service',
'jira-service',
'kubernetes-service',
'capo-service',
'permission-service',
'artifactory-service',
'vault-service',
'harbor-service-prod',
'rundeck-service',
'cruise-control-service',
'artifactory-service.xyz.abc.cloud',
'helm-service',
'Capo Service',
'rocket-chat-service',
'reporting-service',
'bitbucket-service',
'rocketchat-service']
或
x2 = ['journal-service',
'lock-service',
'jenkins-service',
'xyz-reporting-service',
'ansible-service',
'harbor-service',
'version-service',
'jira-service',
'kubernetes-service',
'capo-service',
'permission-service',
'artifactory-service',
'vault-service',
'rundeck-service',
'cruise-control-service',
'helm-service',
'database-ticket-service',
'rocket-chat-service',
'ansible-dpservice',
'reporting-service',
'bitbucket-service',
'rocketchat-service']
正如您在两个列表中看到的,重复值以不同的形式出现,例如:
列表1中:
列表2中:
我需要一个通用的解决方案,不仅适用于这些示例列表:
如何在 Python 3.11 中做到这一点?
来自这个帖子:
!pip install thefuzz
x1 = ['lock-service',
'jenkins-service',
'xyz-reporting-service',
'ansible-service',
'harbor-service',
'version-service',
'jira-service',
'kubernetes-service',
'capo-service',
'permission-service',
'artifactory-service',
'vault-service',
'harbor-service-prod',
'rundeck-service',
'cruise-control-service',
'artifactory-service.xyz.abc.cloud',
'helm-service',
'Capo Service',
'rocket-chat-service',
'reporting-service',
'bitbucket-service',
'rocketchat-service']
from itertools import combinations
from thefuzz import fuzz
[(ratio, a, b) for a, b in combinations(x1, 2) if (ratio:=fuzz.partial_ratio(a, b)) > 90 ]
输出:
[(91, 'lock-service', 'rundeck-service'),
(100, 'xyz-reporting-service', 'reporting-service'),
(100, 'harbor-service', 'harbor-service-prod'),
(100, 'artifactory-service', 'artifactory-service.xyz.abc.cloud'),
(94, 'rocket-chat-service', 'rocketchat-service')]