给定
a
和 b
与 c
中的子字符串列表相关:
a = "how are you ?"
b = "wie gehst's es dir?"
c = [
("how", "wie"),
("are", "gehst's"),
("you", "es")
]
获取产生的偏移量的最佳方法是什么:
offsets = [
("how", "wie", (0, 3), (0, 3)),
("are", "gehst's", (4, 6), (4, 11)),
("you", "es", (7, 9), (12, 14))
]
来自ChatGPT,它建议通过以下方式简单化:
为了从给定的字符串 a 和 b 以及子字符串对 c 的列表中生成所需的偏移量,我们需要找到 a 本身中 a 的每个子字符串以及 b 本身中 b 的每个子字符串的起始和结束位置(索引) .
步骤:
a = "how are you ?"
b = "wie gehst's es dir?"
c = [
("how", "wie"),
("are", "gehst's"),
("you", "es")
]
# Create the offsets list
offsets = []
for substring_a, substring_b in c:
# Find the start and end indices for substring_a in string a
start_a = a.find(substring_a)
end_a = start_a + len(substring_a) - 1
# Find the start and end indices for substring_b in string b
start_b = b.find(substring_b)
end_b = start_b + len(substring_b) - 1
# Append the result as a tuple
offsets.append((substring_a, substring_b, (start_a, end_a), (start_b, end_b)))
# Output the result
print(offsets)
但是有没有更优化的东西,特别是重复的术语?例如
a = "how are you ? are you okay ?"
b = "wie gehst's es dir? geht es dir gut "
c = [
("how", "wie"),
("are", "gehst's"),
("you", "es"),
("are", "geht"),
("you", "es"),
("okay", "gut")
]
str.find()
采用可选的 start
和 end
参数来限制搜索子字符串的位置。因此,您可以使用前一个 end_a
作为下一个 start
的 a.find()
参数。
offsets = []
end_a = end_b = 0
for substring_a, substring_b in c:
# Find the start and end indices for substring_a in string a
start_a = a.find(substring_a, end_a)
end_a = start_a + len(substring_a)
# Find the start and end indices for substring_b in string b
start_b = b.find(substring_b, end_b)
end_b = start_b + len(substring_b)
# Append the result as a tuple
offsets.append((substring_a, substring_b, (start_a, end_a - 1), (start_b, end_b - 1)))
结果:
[('how', 'wie', (0, 2), (0, 2)),
('are', "gehst's", (4, 6), (4, 10)),
('you', 'es', (8, 10), (12, 13)),
('are', 'geht', (14, 16), (21, 24)),
('you', 'es', (18, 20), (26, 27)),
('okay', 'gut', (22, 25), (33, 35))]