按段拆分文字

Question

我有这样的示例文本，

 ## Paragraph 1\n\nThe [`sys`](https://docs.python.org/3.6/library/sys.html#module-sys) module also has attributes for *stdin*, *stdout*, and *stderr*. \n\nThe latter is useful for emitting warnings and error messages to make them visible even when *stdout* has been redirected:\n\n## Paragraph 2\n\nThe [`re`](https://docs.python.org/3.6/library/re.html#module-re) module provides regular expression tools for advanced string processing. For complex matching and manipulation, regular expressions offer succinct, optimized solutions:\n\nWhen only simple capabilities are needed, string methods are preferred because they are easier to read and debug.

我想用正则表达式而不是str.split将它分成两段，所以我试过了。

In [18]: para = re.findall(r'## .+', content)
In [19]: para
Out[19]: ['## Paragraph 1', '## Paragraph 2']

我意图的输出是分开的完整段落。

['## Paragraph 1\n\nThe [`sys`](https://docs.python.org/3.6/library/sys.html#module-sys) module also has attributes for *stdin*, *stdout*, and *stderr*. \n\nThe latter is useful for emitting warnings and error messages to make them visible even when *stdout* has been redirected:\n\n',
'## Paragraph 2\n\nThe [`re`](https://docs.python.org/3.6/library/re.html#module-re) module provides regular expression tools for advanced string processing. For complex matching and manipulation, regular expressions offer succinct, optimized solutions:\n\nWhen only simple capabilities are needed, string methods are preferred because they are easier to read and debug.']

怎么做到了？

Answer 1

你可以试试这个：

import re
s = " ## Paragraph 1\n\nThe [`sys`](https://docs.python.org/3.6/library/sys.html#module-sys) module also has attributes for *stdin*, *stdout*, and *stderr*. \n\nThe latter is useful for emitting warnings and error messages to make them visible even when *stdout* has been redirected:\n\n## Paragraph 2\n\nThe [`re`](https://docs.python.org/3.6/library/re.html#module-re) module provides regular expression tools for advanced string processing. For complex matching and manipulation, regular expressions offer succinct, optimized solutions:\n\nWhen only simple capabilities are needed, string methods are preferred because they are easier to read and debug."
paragraphs = re.split('\n(?=## Paragraph \d+)', s)

输出：

 [' ## Paragraph 1\n\nThe [`sys`](https://docs.python.org/3.6/library/sys.html#module-sys) module also has attributes for *stdin*, *stdout*, and *stderr*. \n\nThe latter is useful for emitting warnings and error messages to make them visible even when *stdout* has been redirected:\n', 
 '## Paragraph 2\n\nThe [`re`](https://docs.python.org/3.6/library/re.html#module-re) module provides regular expression tools for advanced string processing. For complex matching and manipulation, regular expressions offer succinct, optimized solutions:\n\nWhen only simple capabilities are needed, string methods are preferred because they are easier to read and debug.']

Answer 2

您可以尝试内置的拆分功能

string = '''I am new to python # please help me '''
data = string.split('#')
print(data)

产量

['我是蟒蛇新手'，'请帮帮我']

按段拆分文字

问题描述投票：0回答：2

2个回答

最新问题

按段拆分文字

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2