如何在给定子字符串偏移列表的情况下对齐两个字符串的偏移?

问题描述 投票:0回答:1

给定

a
b
c
中的子字符串列表相关:

a = "how are you ?"
b = "wie gehst's es dir?"

c = [
 ("how", "wie"), 
 ("are", "gehst's"),
 ("you", "es")
]

获取产生的偏移量的最佳方法是什么:

offsets = [
 ("how", "wie", (0, 3), (0, 3)), 
 ("are", "gehst's", (4, 6), (4, 11)),
 ("you", "es", (7, 9), (12, 14))
]

来自ChatGPT,它建议通过以下方式简单化:

为了从给定的字符串 a 和 b 以及子字符串对 c 的列表中生成所需的偏移量,我们需要找到 a 本身中 a 的每个子字符串以及 b 本身中 b 的每个子字符串的起始和结束位置(索引) .

步骤:

  • 迭代列表 c 中的每对子字符串。
  • 查找字符串a中a的子串的起始位置和结束位置。
  • 查找 b 中的子串在字符串 b 中的起始位置和结束位置。
  • 存储这对子串及其对应的位置。
a = "how are you ?"
b = "wie gehst's es dir?"
c = [
    ("how", "wie"),
    ("are", "gehst's"),
    ("you", "es")
]

# Create the offsets list
offsets = []
for substring_a, substring_b in c:
    # Find the start and end indices for substring_a in string a
    start_a = a.find(substring_a)
    end_a = start_a + len(substring_a) - 1
    
    # Find the start and end indices for substring_b in string b
    start_b = b.find(substring_b)
    end_b = start_b + len(substring_b) - 1
    
    # Append the result as a tuple
    offsets.append((substring_a, substring_b, (start_a, end_a), (start_b, end_b)))

# Output the result
print(offsets)

但是有没有更优化的东西,特别是重复的术语?例如

a = "how are you ? are you okay ?"
b = "wie gehst's es dir?  geht es dir gut "

c = [
 ("how", "wie"), 
 ("are", "gehst's"),
 ("you", "es"),
 ("are", "geht"), 
 ("you", "es"),
 ("okay", "gut")
]
python string dynamic-programming offset text-alignment
1个回答
0
投票

str.find()
采用可选的
start
end
参数来限制搜索子字符串的位置。因此,您可以使用前一个
end_a
作为下一个
start
a.find()
参数。

offsets = []
end_a = end_b = 0

for substring_a, substring_b in c:
    # Find the start and end indices for substring_a in string a
    start_a = a.find(substring_a, end_a)
    end_a = start_a + len(substring_a)
    # Find the start and end indices for substring_b in string b
    start_b = b.find(substring_b, end_b)
    end_b = start_b + len(substring_b)
    # Append the result as a tuple
    offsets.append((substring_a, substring_b, (start_a, end_a - 1), (start_b, end_b - 1)))

结果:

[('how', 'wie', (0, 2), (0, 2)),
 ('are', "gehst's", (4, 6), (4, 10)),
 ('you', 'es', (8, 10), (12, 13)),
 ('are', 'geht', (14, 16), (21, 24)),
 ('you', 'es', (18, 20), (26, 27)),
 ('okay', 'gut', (22, 25), (33, 35))]
© www.soinside.com 2019 - 2024. All rights reserved.