BeautifulSoup 美化改变内容,而不仅仅是布局

问题描述 投票:0回答:1

我有一个 SVG 图像,它是一个 XML 文件。 如果我用 BeautifulSoup 解析它并未经修改地输出它,那么当我使用 prettify 时它会修改它。图像呈现不同的效果。文本向左移动。但当我将汤投射到字符串时,它会正确渲染。

输入:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<svg
   width="1024"
   height="600"
   viewBox="0 0 1024 600"
   version="1.1"
   id="svg32"
   sodipodi:docname="test_text.svg"
   inkscape:version="1.4 (86a8ad7, 2024-10-11)"
   xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"
   xmlns:sodipodi="http://sodipodi.sourceforge.net/DTD/sodipodi-0.dtd"
   xmlns="http://www.w3.org/2000/svg"
   xmlns:svg="http://www.w3.org/2000/svg">
  <sodipodi:namedview
     id="namedview1"
     pagecolor="#505050"
     bordercolor="#ffffff"
     borderopacity="1"
     inkscape:showpageshadow="0"
     inkscape:pageopacity="0"
     inkscape:pagecheckerboard="1"
     inkscape:deskcolor="#505050"
     inkscape:zoom="1.52"
     inkscape:cx="203.28947"
     inkscape:cy="336.84211"
     inkscape:window-width="3840"
     inkscape:window-height="2054"
     inkscape:window-x="3829"
     inkscape:window-y="-11"
     inkscape:window-maximized="1"
     inkscape:current-layer="svg32" />
  <rect
     id="Rectangle_861"
     data-name="Rectangle 861"
     width="700"
     height="394"
     fill="#1b415a"
     style="fill:#1b415a;fill-opacity:1"
     x="0"
     y="0" />
  <text
     xml:space="preserve"
     style="font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;font-size:74.6667px;font-family:Arial;-inkscape-font-specification:Arial;text-align:center;writing-mode:lr-tb;direction:ltr;text-anchor:middle;fill:#ffffff;fill-opacity:0.7;fill-rule:evenodd;stroke-width:3.77953;paint-order:markers fill stroke"
     x="132.21741"
     y="166.19328"
     id="text33"
     inkscape:label="_110_C_proper"><tspan
       sodipodi:role="line"
       id="tspan33"
       x="132.21741"
       y="166.19328"
       style="font-size:74.6667px;fill:#ffffff;fill-opacity:0.7">110ºC</tspan></text>
</svg>

输出:

<?xml version="1.0" encoding="utf-8"?>
<svg:svg height="600" id="svg32" inkscape:version="1.4 (86a8ad7, 2024-10-11)" sodipodi:docname="test_text.svg" version="1.1" viewBox="0 0 1024 600" width="1024" xmlns="http://www.w3.org/2000/svg" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape" xmlns:sodipodi="http://sodipodi.sourceforge.net/DTD/sodipodi-0.dtd" xmlns:svg="http://www.w3.org/2000/svg">
 <sodipodi:namedview bordercolor="#ffffff" borderopacity="1" id="namedview1" inkscape:current-layer="svg32" inkscape:cx="203.28947" inkscape:cy="336.84211" inkscape:deskcolor="#505050" inkscape:pagecheckerboard="1" inkscape:pageopacity="0" inkscape:showpageshadow="0" inkscape:window-height="2054" inkscape:window-maximized="1" inkscape:window-width="3840" inkscape:window-x="3829" inkscape:window-y="-11" inkscape:zoom="1.52" pagecolor="#505050"/>
 <svg:rect data-name="Rectangle 861" fill="#1b415a" height="394" id="Rectangle_861" style="fill:#1b415a;fill-opacity:1" width="700" x="0" y="0"/>
 <svg:text id="text33" inkscape:label="_110_C_proper" style="font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;font-size:74.6667px;font-family:Arial;-inkscape-font-specification:Arial;text-align:center;writing-mode:lr-tb;direction:ltr;text-anchor:middle;fill:#ffffff;fill-opacity:0.7;fill-rule:evenodd;stroke-width:3.77953;paint-order:markers fill stroke" x="132.21741" xml:space="preserve" y="166.19328">
  <svg:tspan id="tspan33" sodipodi:role="line" style="font-size:74.6667px;fill:#ffffff;fill-opacity:0.7" x="132.21741" y="166.19328">
   110ºC
  </svg:tspan>
 </svg:text>
</svg:svg>

我的代码:

from bs4 import BeautifulSoup

bad_image_path = "test_text.svg"

with open(bad_image_path, 'r', encoding='utf8') as f:
   soup = BeautifulSoup(f, "xml")

# make optional modifications to the data

with open('test_text_converted.svg', 'w', encoding='utf8') as f:
    f.write(soup.prettify())  # makes weird changes
    # f.write(str(soup))
  1. 为什么美化会修改图像以及如何防止这种情况?
  2. 如何阻止 BeautifulSoup 将命名空间前缀添加到 svg 元素?
python xml svg beautifulsoup
1个回答
0
投票
  1. 为什么美化会修改图像以及如何防止这种情况?

我看了

soup.pretty()
。我看到他们说(强调我的):

prettify()
改变 HTML 文档的含义,并且不应该用于重新格式化文档。 prettify() 的目标是帮助您直观地理解您使用的文档的结构。


  1. 如何阻止 BeautifulSoup 将命名空间前缀添加到 svg 元素?

虽然我不知道如何在 bs4 中执行此操作,但我想考虑使用 Python 内置 XML 解析的替代方案

Node.toprettyxml()
这可能有用:

import xml.dom.minidom
from xml import dom

xml_string = """\
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
....
"""

output = dom.minidom.parseString(xml_string)
print(output.toprettyxml())
© www.soinside.com 2019 - 2024. All rights reserved.