如何在给定子字符串偏移列表的情况下对齐两个字符串的偏移？

Question

给定

和

与

中的子字符串列表相关：

a = "how are you ?"
b = "wie gehst's es dir?"

c = [
 ("how", "wie"), 
 ("are", "gehst's"),
 ("you", "es")
]

获取产生的偏移量的最佳方法是什么：

offsets = [
 ("how", "wie", (0, 3), (0, 3)), 
 ("are", "gehst's", (4, 6), (4, 11)),
 ("you", "es", (7, 9), (12, 14))
]

来自ChatGPT，它建议通过以下方式简单化：

为了从给定的字符串 a 和 b 以及子字符串对 c 的列表中生成所需的偏移量，我们需要找到 a 本身中 a 的每个子字符串以及 b 本身中 b 的每个子字符串的起始和结束位置（索引） .

步骤：

迭代列表 c 中的每对子字符串。
查找字符串a中a的子串的起始位置和结束位置。
查找 b 中的子串在字符串 b 中的起始位置和结束位置。
存储这对子串及其对应的位置。

a = "how are you ?"
b = "wie gehst's es dir?"
c = [
    ("how", "wie"),
    ("are", "gehst's"),
    ("you", "es")
]

# Create the offsets list
offsets = []
for substring_a, substring_b in c:
    # Find the start and end indices for substring_a in string a
    start_a = a.find(substring_a)
    end_a = start_a + len(substring_a) - 1
    
    # Find the start and end indices for substring_b in string b
    start_b = b.find(substring_b)
    end_b = start_b + len(substring_b) - 1
    
    # Append the result as a tuple
    offsets.append((substring_a, substring_b, (start_a, end_a), (start_b, end_b)))

# Output the result
print(offsets)

但是有没有更优化的东西，特别是重复的术语？例如

a = "how are you ? are you okay ?"
b = "wie gehst's es dir?  geht es dir gut "

c = [
 ("how", "wie"), 
 ("are", "gehst's"),
 ("you", "es"),
 ("are", "geht"), 
 ("you", "es"),
 ("okay", "gut")
]

Answer 1

str.find()

采用可选的

start

和

end

参数来限制搜索子字符串的位置。因此，您可以使用前一个

end_a

作为下一个

start

的

a.find()

参数。

offsets = []
end_a = end_b = 0

for substring_a, substring_b in c:
    # Find the start and end indices for substring_a in string a
    start_a = a.find(substring_a, end_a)
    end_a = start_a + len(substring_a)
    # Find the start and end indices for substring_b in string b
    start_b = b.find(substring_b, end_b)
    end_b = start_b + len(substring_b)
    # Append the result as a tuple
    offsets.append((substring_a, substring_b, (start_a, end_a - 1), (start_b, end_b - 1)))

结果：

[('how', 'wie', (0, 2), (0, 2)),
 ('are', "gehst's", (4, 6), (4, 10)),
 ('you', 'es', (8, 10), (12, 13)),
 ('are', 'geht', (14, 16), (21, 24)),
 ('you', 'es', (18, 20), (26, 27)),
 ('okay', 'gut', (22, 25), (33, 35))]

如何在给定子字符串偏移列表的情况下对齐两个字符串的偏移？

问题描述投票：0回答：1

1个回答

最新问题

如何在给定子字符串偏移列表的情况下对齐两个字符串的偏移？

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1