UnrecognizedImageError - 图像插入错误 - python-docx

Question

我正在尝试使用

python-docx

将 wmf 文件插入到 docx 中，这会产生以下回溯。

Traceback (most recent call last):
  File "C:/Users/ADMIN/PycharmProjects/ppt-to-word/ppt_reader.py", line 79, in <module>
    read_ppt(path, file)
  File "C:/Users/ADMIN/PycharmProjects/ppt-to-word/ppt_reader.py", line 73, in read_ppt
    write_docx(ppt_data, False)
  File "C:/Users/ADMIN/PycharmProjects/ppt-to-word/ppt_reader.py", line 31, in write_docx
    document.add_picture(slide_data.get('picture_location'), width=Inches(5.0))
  File "C:\Python34\lib\site-packages\docx\document.py", line 72, in add_picture
    return run.add_picture(image_path_or_stream, width, height)
  File "C:\Python34\lib\site-packages\docx\text\run.py", line 62, in add_picture
    inline = self.part.new_pic_inline(image_path_or_stream, width, height)
  File "C:\Python34\lib\site-packages\docx\parts\story.py", line 56, in new_pic_inline
    rId, image = self.get_or_add_image(image_descriptor)
  File "C:\Python34\lib\site-packages\docx\parts\story.py", line 29, in get_or_add_image
    image_part = self._package.get_or_add_image_part(image_descriptor)
  File "C:\Python34\lib\site-packages\docx\package.py", line 31, in get_or_add_image_part
    return self.image_parts.get_or_add_image_part(image_descriptor)
  File "C:\Python34\lib\site-packages\docx\package.py", line 74, in get_or_add_image_part
    image = Image.from_file(image_descriptor)
  File "C:\Python34\lib\site-packages\docx\image\image.py", line 55, in from_file
    return cls._from_stream(stream, blob, filename)
  File "C:\Python34\lib\site-packages\docx\image\image.py", line 176, in _from_stream
    image_header = _ImageHeaderFactory(stream)
  File "C:\Python34\lib\site-packages\docx\image\image.py", line 199, in _ImageHeaderFactory
    raise UnrecognizedImageError
docx.image.exceptions.UnrecognizedImageError

图像文件为

.wmf

格式。

任何帮助或建议表示赞赏。

Answer 1

python-docx

通过“识别”其独特的标头来识别图像文件的类型。通过这种方式，它可以区分 JPEG 和 PNG、TIFF 等。这比映射文件扩展名更可靠，也比要求用户告诉您类型更方便。这是一种非常常见的方法。

此错误表明

python-docx

未找到它识别的标头。 Windows 图元文件格式 (WMF) 可能会很棘手，专有规范和现场文件样本的变化有很大的余地。

要解决此问题，我建议您使用可以识别的内容来读取文件（我将从 Pillow 开始），并将其“转换”为相同或其他格式，希望在此过程中纠正标头。首先，我会尝试读取它并将其另存为 WMF（或者可能是 EMF，如果可以的话）。这可能足以达到目的。如果您必须更改为中间格式然后再返回，这可能会造成损失，但也许总比没有好。

ImageMagick 可能是另一个不错的选择，因为它可能比 Pillow 具有更好的覆盖范围。

Answer 2

python-docx/image.py

将从SIGNATURES 读取不同的图片文件格式

格式

1.jpg

使用

图像转换器

将1.jpg转换为不同的文件格式。使用

magic

 获取哑剧类型。

文件格式.jpg.png.jfif.exif.gif.tiff.bmp.eps.hdr.ico.svg.tga.wbmp.webp

哑剧类型	添加图片()
图像/jpeg	√
图片/png	√
图像/jpeg	√
	√
图像/gif	√
图像/tiff	√
图像/x-ms-bmp	√
申请/后记	×
应用程序/八位字节流	×
图像/x-图标	×
图像/svg+xml	×
图片/x-tga	×
应用程序/八位字节流	×
图片/网页	×

如何解决

A计划

将其他格式转换为支持的格式，例如 .jpg

安装

pip install pillow

代码

from pathlib import Path from PIL import Image def image_to_jpg(image_path): path = Path(image_path) if path.suffix not in {'.jpg', '.png', '.jfif', '.exif', '.gif', '.tiff', '.bmp'}: jpg_image_path = f'{path.parent / path.stem}_result.jpg' Image.open(image_path).convert('RGB').save(jpg_image_path) return jpg_image_path return image_path if __name__ == '__main__': from docx import Document document = Document() document.add_picture(image_to_jpg('1.jpg')) document.add_picture(image_to_jpg('1.webp')) document.save('test.docx')

B计划

首先，尝试手动将图片添加到Word中。如果成功，说明Word支持该格式。然后通过继承

BaseImageHeader

类并通过

from_stream()

添加图像格式来实现

SIGNATURES

方法来修改此库。

缺少文件后缀

将1.jpg修改为1

from docx import Document document = Document() document.add_picture('1') document.save('test.docx')

它会显示这个

使用这个

from docx import Document document = Document() document.add_picture(open('1', mode='rb')) document.save('test.docx')

结论

import io from pathlib import Path import magic from PIL import Image def image_to_jpg(image_path_or_stream): f = io.BytesIO() if isinstance(image_path_or_stream, str): path = Path(image_path_or_stream) if path.suffix in {'.jpg', '.png', '.jfif', '.exif', '.gif', '.tiff', '.bmp'}: f = open(image_path_or_stream, mode='rb') else: Image.open(image_path_or_stream).convert('RGB').save(f, format='JPEG') else: buffer = image_path_or_stream.read() mime_type = magic.from_buffer(buffer, mime=True) if mime_type in {'image/jpeg', 'image/png', 'image/gif', 'image/tiff', 'image/x-ms-bmp'}: f = image_path_or_stream else: Image.open(io.BytesIO(buffer)).convert('RGB').save(f, format='JPEG') return f if name == 'main': from docx import Document document = Document() document.add_picture(image_to_jpg('1.jpg')) document.add_picture(image_to_jpg('1.webp')) document.add_picture(image_to_jpg(open('1.jpg', mode='rb'))) document.add_picture(image_to_jpg(open('1', mode='rb'))) # copy 1.webp and rename it to 1 document.save('test.docx')

UnrecognizedImageError - 图像插入错误 - python-docx

问题描述投票：0回答：2

2个回答

1.jpg

A计划

将其他格式转换为支持的格式，例如 .jpg

首先，尝试手动将图片添加到Word中。如果成功，说明Word支持该格式。然后通过继承

将1.jpg修改为1

最新问题

UnrecognizedImageError - 图像插入错误 - python-docx

问题描述 投票：0回答：2

2个回答

1.jpg

A计划

将其他格式转换为支持的格式，例如 .jpg

首先，尝试手动将图片添加到Word中。如果成功，说明Word支持该格式。然后通过继承

将1.jpg修改为1

最新问题

问题描述投票：0回答：2