按段拆分文字

问题描述 投票:0回答:2

我有这样的示例文本,

 ## Paragraph 1\n\nThe [`sys`](https://docs.python.org/3.6/library/sys.html#module-sys) module also has attributes for *stdin*, *stdout*, and *stderr*. \n\nThe latter is useful for emitting warnings and error messages to make them visible even when *stdout* has been redirected:\n\n## Paragraph 2\n\nThe [`re`](https://docs.python.org/3.6/library/re.html#module-re) module provides regular expression tools for advanced string processing. For complex matching and manipulation, regular expressions offer succinct, optimized solutions:\n\nWhen only simple capabilities are needed, string methods are preferred because they are easier to read and debug.

我想用正则表达式而不是str.split将它分成两段,所以我试过了。

In [18]: para = re.findall(r'## .+', content)
In [19]: para
Out[19]: ['## Paragraph 1', '## Paragraph 2']

我意图的输出是分开的完整段落。

['## Paragraph 1\n\nThe [`sys`](https://docs.python.org/3.6/library/sys.html#module-sys) module also has attributes for *stdin*, *stdout*, and *stderr*. \n\nThe latter is useful for emitting warnings and error messages to make them visible even when *stdout* has been redirected:\n\n',
'## Paragraph 2\n\nThe [`re`](https://docs.python.org/3.6/library/re.html#module-re) module provides regular expression tools for advanced string processing. For complex matching and manipulation, regular expressions offer succinct, optimized solutions:\n\nWhen only simple capabilities are needed, string methods are preferred because they are easier to read and debug.']

怎么做到了?

python regex
2个回答
1
投票

你可以试试这个:

import re
s = " ## Paragraph 1\n\nThe [`sys`](https://docs.python.org/3.6/library/sys.html#module-sys) module also has attributes for *stdin*, *stdout*, and *stderr*. \n\nThe latter is useful for emitting warnings and error messages to make them visible even when *stdout* has been redirected:\n\n## Paragraph 2\n\nThe [`re`](https://docs.python.org/3.6/library/re.html#module-re) module provides regular expression tools for advanced string processing. For complex matching and manipulation, regular expressions offer succinct, optimized solutions:\n\nWhen only simple capabilities are needed, string methods are preferred because they are easier to read and debug."
paragraphs = re.split('\n(?=## Paragraph \d+)', s)

输出:

 [' ## Paragraph 1\n\nThe [`sys`](https://docs.python.org/3.6/library/sys.html#module-sys) module also has attributes for *stdin*, *stdout*, and *stderr*. \n\nThe latter is useful for emitting warnings and error messages to make them visible even when *stdout* has been redirected:\n', 
 '## Paragraph 2\n\nThe [`re`](https://docs.python.org/3.6/library/re.html#module-re) module provides regular expression tools for advanced string processing. For complex matching and manipulation, regular expressions offer succinct, optimized solutions:\n\nWhen only simple capabilities are needed, string methods are preferred because they are easier to read and debug.']

0
投票

您可以尝试内置的拆分功能

string = '''I am new to python # please help me '''
data = string.split('#')
print(data)

产量

['我是蟒蛇新手','请帮帮我']

© www.soinside.com 2019 - 2024. All rights reserved.