访问 python-docx 中的形状和文本框

Question

此代码似乎无法访问形状内的文本，有没有办法做到这一点？

from docx import Document

doc = Document('template.docx')
replace_word = {'Captain': 'Gerard B. Geronimo'}
for word in replace_word:
    for p in doc.paragraphs:
        if p.text.find(word) >= 0:
            p.text = p.text.replace(word, replace_word[word])

doc.save('note_demo.docx')

我尝试了在互联网上可以找到的所有内容，但什么也没找到。

Answer 1

形状和文本框中的文本不在直接作为文档正文子元素的段落中。并且

Document.paragraphs

仅包含直接是文档正文子元素的段落。

Document.paragraphs

表格单元格中甚至不包含段落。因此，对于表格中的文本也需要另一种方法。

但是所有文本，以及形状和文本框中的文本都位于底层文档正文 XML 元素的

CT_Text

元素中。因此，XPath 将能够使用路径

CT_Text

获取

w:t

(

CT_Body

) 的所有

w:body

元素 (

.//w:t

)。有了这个元素，就可以替换它们的文本内容。

示例：

from docx import Document

document = Document('template.docx')

replace_word = {'Captain': 'Gerard B. Geronimo'}

for word in replace_word:
    for textElement in document._body._element.xpath(".//w:t"):
        print(textElement.text)
        if textElement.text.find(word) >= 0:
            textElement.text = textElement.text.replace(word, replace_word[word])
        print(textElement.text)
            
document.save('result.docx')

只要搜索到的

word

未拆分为多个文本运行以及多个

CT_Text

元素，此方法就有效。对于像“Captain”这样的

word

，只要该单词的部分内容没有被格式化，就不会发生这种情况。但是当

word

包含空格和/或特殊字符时，这种情况发生得非常快。如果出于多种原因将文本拆分为不同的文本，Microsoft Word 就是一个野兽。

访问 python-docx 中的形状和文本框

问题描述投票：0回答：1

1个回答

最新问题

访问 python-docx 中的形状和文本框

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1