如何将 docutils 文档树转换为 HTML 字符串？

Question

我正在尝试使用 docutils 包将 ReST 转换为 HTML。这个答案简洁地使用了 docutils

publish_*

便利功能来一步实现这一目标。我想要转换的 ReST 文档有多个部分，我想在生成的 HTML 中将这些部分分开。因此，我想分解这个过程：

将 ReST 解析为节点树。
适当分离节点。
将我想要的节点转换成HTML。

这是我正在努力解决的第三步。以下是我执行步骤一和步骤二的方法：

from docutils import utils
from docutils.frontend import OptionParser
from docutils.parsers.rst import Parser

# preamble
rst = '*NB:* just an example.'   # will actually have many sections
path = 'some.url.com'
settings = OptionParser(components=(Parser,)).get_default_values()

# step 1
document = utils.new_document(path, settings)
Parser().parse(rst, document)

# step 2
for node in document:
   do_something_with(node)

# step 3: Help!
for node in filtered(document):
   print(convert_to_html(node))

我找到了

HTMLTranslator

类和

Publisher

类。它们看起来很相关，但我正在努力寻找好的文档。我应该如何实现

convert_to_html

功能？

Answer 1

我的问题是我试图在太低的级别上使用 docutils 包。他们为此类事情提供了一个接口：

from docutils.core import publish_doctree, publish_from_doctree

rst = '*NB:* just an example.'

# step 1
tree = publish_doctree(rst)

# step 2
# do something with the tree

# step 3
html = publish_from_doctree(tree, writer_name='html').decode()
print(html)

第一步现在简单多了。尽管如此，我对这个结果还是有点不满意；我意识到我真正想要的是一个

publish_node

函数。如果您知道更好的方法，请发布。

我还应该注意到，我还没有设法让它与 Python 3 一起工作。

真正的教训

我实际上想做的是从文档树中提取所有侧边栏元素，以便可以将它们单独处理到文章的主体。这不是

docutils

想要解决的用例。因此没有

publish_node

功能。

一旦我意识到这一点，正确的方法就足够简单了：

使用
```
docutils
```
生成 HTML。
使用
```
BeautifulSoup
```
提取侧边栏元素。

这是完成工作的代码：

from docutils.core import publish_parts
from bs4 import BeautifulSoup

rst = get_rst_string_from_somewhere()

# get just the body of an HTML document 
html = publish_parts(rst, writer_name='html')['html_body']
soup = BeautifulSoup(html, 'html.parser')

# docutils wraps the body in a div with the .document class
# we can just dispose of that div altogether
wrapper = soup.select('.document')[0]
wrapper.unwrap()

# knowing that docutils gives all sidebar elements the
# .sidebar class makes extracting those elements easy
sidebar = ''.join(tag.extract().prettify() for tag in soup.select('.sidebar'))

# leaving the non-sidebar elements as the document body
body = soup.prettify()

Answer 2

您需要

publish_parts

的

docutils.core

！

import docutils
from docutils.core import publish_parts

if __name__ == "__main__":
  # Convert a string:
  parts = publish_parts(
    source = "Hello\n========\n\nThis is my document.",
    writer_name = "html5"
  )
  # Prints only "<p>This is my document.</p>"
  print(parts["body"])

  # To convert a file:
  parts = publish_parts(
    source = None,
    source_path = "path/to/doc.rst",
    source_class = docutils.io.FileInput,
    writer_name = "html5"
  )
  print(parts["body"])

不要忘记，如果您想使用这些部件，您仍然需要进行一些字符串替换，例如正如文档中提到的，即使您使用“整个”部分，仍然需要在输出中设置编码。

要查看可用的部件，请查看文档：https://docutils.sourceforge.io/docs/api/publisher.html#publish-parts

如何将 docutils 文档树转换为 HTML 字符串？

问题描述投票：0回答：2

2个回答

真正的教训

最新问题

如何将 docutils 文档树转换为 HTML 字符串？

问题描述 投票：0回答：2

2个回答

真正的教训

最新问题

问题描述投票：0回答：2