.docx文档的属性错误没有属性'xpath'

Question

from docx import *
document = Document(r'filepath.docx')
words = document.xpath('//w:r', namespaces=document.nsmap)
WPML_URI = '{http://schemas.openxmlformats.org/wordprocessingml/2006/main'
tag_rPr = WPML_URI + 'rPr'
tag_highlight = WPML_URI + 'highlight'
tag_val = WPML_URI + 'val'
tag_t = WPML_URI + 't'
for word in words:
    for rPr in word.findall(tag_rPr):
        high = rPr.findall(tag_highlight)
        for hi in high:
            if hi.attribute[tag_val] == 'yellow':
                print(word.find(tag_t).text.encode('utf-8').lower())

这个代码在理论上应该得到文档文本，然后找到黄色的突出显示的文本，但我的问题是在开始时按原样运行代码，我得到AttributeError: 'Document' object has no attribute 'xpath'作为错误消息。它的问题显然是与words = document.xpath('//w:r', namespaces=document.nsmap)，我不知道如何解决

Answer 1

@PirateNinjas是对的。 Document对象不是lxml.etree._Element的子类，因此没有.xpath()方法。这就是AttributeError所指出的;对象上的每个方法都是一个属性（就像一个实例变量一样），如果你要求的名称不存在，则会出现此错误。

但是，Document._element确实是_Element的子类，可能适合你。至少它不会给你这个错误，应该让你朝着正确的方向前进。此代码应该为您提供文档主要故事中的所有<w:r>元素（即文档正文，但不包含标题，脚注等）：

rs = document._element.xpath("//w:r")

Answer 2

问题是你正试图对不允许的docx.Document做点什么。如果你看看here你可以看到这个文件，.xpath不存在Document。

如果您需要这些单词，您可以通过Document.paragraph方法获取它们 - 也可以在链接的文档中获取。

.docx文档的属性错误没有属性'xpath'

问题描述投票：0回答：2

2个回答

最新问题

.docx文档的属性错误没有属性'xpath'

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2