我有两个列表,正在尝试使用核心 python 对键执行连接,而不使用任何其他库。 键列是元组 100、101 和 102 的第一个值。
List 1 = [(100, 'Steven', '515.123.4567'), (101, 'Neena', '515.123.4568'), (102, 'Lex', '515.123.4569')]
List 2 = [(100, 'Engineer', '515.123.4567'), (101, 'Doctor', '515.123.4568')]
预期结果
内连接
[(100, 'Steven', '515.123.4567', 'Engineer'), (101, 'Neena', '515.123.4568', 'Doctor')]
左外
[(100, 'Steven', '515.123.4567', 'Engineer'), (101, 'Neena', '515.123.4568', 'Doctor'), (102, 'Lex', '515.123.4569', null)]
我们可以使用 pandas 轻松做到这一点。但我试图在 python 本身中做到这一点。
我尝试使用集合和itertools,但没有得到预期的结果
List_1 = [(100, 'Steven', '515.123.4567'), (101, 'Neena', '515.123.4568'), (102, 'Lex', '515.123.4569')]
List_2 = [(100, 'Steven', '515.123.4567'), (101, 'Neena', '515.123.4568')]
def inner_join(list_1, list_2):
return [entries for entries in list_1 if entries in list_2]
def left_outer_join(list_1, list_2):
return [entries if entries in list_2 else None for entries in list_1]
print(inner_join(List_1, List_2))
print(left_outer_join(List_1, List_2))
这给出了以下结果:
[(100, 'Steven', '515.123.4567'), (101, 'Neena', '515.123.4568')]
[(100, 'Steven', '515.123.4567'), (101, 'Neena', '515.123.4568'), None]
问题已用不同的示例进行了更新。下面是两个可以产生正确结果的函数:
def inner_join(list1: list[tuple], list2: list[tuple]):
ids_list2 = [item[0] for item in list2]
return_list = []
for item in list1:
if item[0] in ids_list2:
item_list_1 = list(item)
engineer = list2[ids_list2.index(item[0])][1]
return_list.append(tuple(item_list_1 + [engineer]))
return return_list
print(inner_join(list1, list2))
# [(100, 'Steven', '515.123.4567', 'Engineer'), (101, 'Neena', '515.123.4568', 'Doctor')]
对于左外,您可以轻松修改上面的内容:
def left_outer_join(list1: list[tuple], list2: list[tuple]):
ids_list2 = [item[0] for item in list2]
return_list = []
for item in list1:
if item[0] in ids_list2:
item_list_1 = list(item)
engineer = list2[ids_list2.index(item[0])][1]
return_list.append(tuple(item_list_1 + [engineer]))
else:
return_list.append(None)
return return_list
print(left_outer_join(list1, list2))
# [(100, 'Steven', '515.123.4567', 'Engineer'), (101, 'Neena', '515.123.4568', 'Doctor'), None]
我不确定“加入”在这里是正确的术语。连接基于公共密钥从另一个相关数据集执行数据集的丰富。您所说明的是两个列表的交集(“内部连接”),可以对其进行修改以模拟您所说的“左外部”。
list1 = [(100, 'Steven', '515.123.4567'), (101, 'Neena', '515.123.4568'), (102, 'Lex', '515.123.4569')]
list2 = [(100, 'Steven', '515.123.4567'), (101, 'Neena', '515.123.4568')]
inner_join = [item for item in list1 if item in list2]
left_outer = [item if item in list2 else None for item in list1]
print(inner_join)
print(left_outer)
# [(100, 'Steven', '515.123.4567'), (101, 'Neena', '515.123.4568')]
# [(100, 'Steven', '515.123.4567'), (101, 'Neena', '515.123.4568'), None]