如何分别用 3 个不同的字符替换 3 组字符中的每个字符实例?

问题描述 投票:0回答:1

这是我的输入:

"Once there     was a (so-called) rock. it.,was not! in fact, a big rock."

我需要它输出一个看起来像这样的数组

["Once", " ", "there", " ", "was", " ", "a", ",", "so", " ", "called", ",", "rock", ".", "it", ".", "was", " ", "not", ".", "in", " ", "fact", ",", "a", " ", "big", " ", "rock"]

输入需要经过一些规则才能使标点符号变成这样。规则如下:

spaceDelimiters  = " -_" 
commaDelimiters  = ",():;\""
periodDelimiters = ".!?"

如果有空格分隔符,则应将其替换为空格。其他逗号和句点也是如此。逗号优先于空格,句点优先于逗号

我已经能够删除所有分隔符,但我需要它们作为数组的单独部分。并且存在层次结构,句点优先于逗号,优先于空格

也许我的方法是错误的?这就是我所拥有的:

def split(string, delimiters):
    regex_pattern = '|'.join(map(re.escape, delimiters))
    return re.split(regex_pattern, string)

最终一切都错了。还差得远呢

python arrays regex string
1个回答
0
投票

使用

re
库在单词边界上分割文本,然后按事件顺序替换

import re

s="Once there     was a (so-called) rock. it.,was not! in fact, a big rock."

# split regex into tokens along word boundaries
regex=r"\b"

l=re.split(regex,s)

def replaceDelimeters(token:str):
    
    # in each token identify if it contains a delimeter
    spaceDelimiters  = r"[^- _]*[- _]+[^- _]*" 
    commaDelimiters  = r"[^,():;\"]*[,():;\"]+[^,():;\"]*"
    periodDelimiters = r"[^.!?]*[.!?]+[^.!?]*"
    
    # substitute for the replacement
    token=re.sub(periodDelimiters,".",token)
    token=re.sub(commaDelimiters,",",token)
    token=re.sub(spaceDelimiters," ",token)
    return token

# apply
[replaceDelimeters(token) for token in l if token!=""]
© www.soinside.com 2019 - 2024. All rights reserved.