如何分别用 3 个不同的字符替换 3 组字符中的每个字符实例？

Question

这是我的输入：

"Once there     was a (so-called) rock. it.,was not! in fact, a big rock."

我需要它输出一个看起来像这样的数组

["Once", " ", "there", " ", "was", " ", "a", ",", "so", " ", "called", ",", "rock", ".", "it", ".", "was", " ", "not", ".", "in", " ", "fact", ",", "a", " ", "big", " ", "rock"]

输入需要经过一些规则才能使标点符号变成这样。规则如下：

spaceDelimiters  = " -_" 
commaDelimiters  = ",():;\""
periodDelimiters = ".!?"

如果有空格分隔符，则应将其替换为空格。其他逗号和句点也是如此。逗号优先于空格，句点优先于逗号

我已经能够删除所有分隔符，但我需要它们作为数组的单独部分。并且存在层次结构，句点优先于逗号，优先于空格

也许我的方法是错误的？这就是我所拥有的：

def split(string, delimiters):
    regex_pattern = '|'.join(map(re.escape, delimiters))
    return re.split(regex_pattern, string)

最终一切都错了。还差得远呢

Answer 1

使用

re

库在单词边界上分割文本，然后按事件顺序替换

import re

s="Once there     was a (so-called) rock. it.,was not! in fact, a big rock."

# split regex into tokens along word boundaries
regex=r"\b"

l=re.split(regex,s)

def replaceDelimeters(token:str):
    
    # in each token identify if it contains a delimeter
    spaceDelimiters  = r"[^- _]*[- _]+[^- _]*" 
    commaDelimiters  = r"[^,():;\"]*[,():;\"]+[^,():;\"]*"
    periodDelimiters = r"[^.!?]*[.!?]+[^.!?]*"
    
    # substitute for the replacement
    token=re.sub(periodDelimiters,".",token)
    token=re.sub(commaDelimiters,",",token)
    token=re.sub(spaceDelimiters," ",token)
    return token

# apply
[replaceDelimeters(token) for token in l if token!=""]

如何分别用 3 个不同的字符替换 3 组字符中的每个字符实例？

问题描述投票：0回答：1

1个回答

最新问题

如何分别用 3 个不同的字符替换 3 组字符中的每个字符实例？

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1