TypeError:类型'lxml.etree._ElementTree'的对象没有len()

问题描述 投票:0回答:1

我正在尝试删除Python函数返回的XML文件中的一些空text标记,但出现此错误:TypeError: object of type 'lxml.etree._ElementTree' has no len()。为什么?

这是功能:

def due(pdfpath):

ntree = uniform_cm(pdfpath)
etree.strip_tags(ntree, 'textline')

# Search for all text "textbox" elements
for textbox in ntree.xpath('//textbox'):
    new_line = etree.Element("new_line")
    previous_bb = None

    # From a given textbox element, iterate over all the "text" elements
    for x in textbox.iter("text"):
        # Get current bb valu
        bb = getBBoxFirstValue(x)
        # Check current and past values aren't empty
        if bb is not None and previous_bb is not None and (bb - previous_bb) > 20:
            # Inserte newline into parent tag
            x.getparent().insert(x.getparent().index(x), new_line)

            # A new "new_line" element is created
            new_line = etree.Element("new_line")

        # Append current element is new_line tag
        new_line.append(x)

        # Keep latest non empty BBox 1st value
        if bb is not None:
            previous_bb = bb

    # Add last new_line element if not null
    textbox.append(new_line)
tree = ntree


soup = BeautifulSoup(tree, "lxml")

for x in soup.find_all():
    if len(x.get_text(strip=True)) == 0:
        x.extract()


return tree
python python-3.x beautifulsoup lxml elementtree
1个回答
0
投票
在代码示例中,

lenonly情况是:if len(x.get_text(strip=True)) == 0:

但是我检查了type(x)并得到了bs4.element.Tag,而在您的错误消息中是'lxml.etree._ElementTree' has no len()

显然,您的错误发生在某些other位置。

对未来的建议:当您寻找异常原因时,状态精确地发生在哪一行。StackTrace包含有关此问题的指示。

所以我进行了一些调查,但与您没有任何关系代码示例。

当您使用lxml解析XML文件时,例如:

from lxml import etree as et
tree = et.parse('Input.xml')

tree的类型(整个XML文档)只是lxml.etree._ElementTree

[当您现在尝试运行:len(tree)时,您将得到:

TypeError: object of type 'lxml.etree._ElementTree' has no len()

但是当您从以下树中读取root元素时:root = tree.getroot()root的类型是lxml.etree._Element(请注意,现在Element而不是整个文档),则可以运行len(root),获取其(直接)子代的数量。其他都一样将其作为XML树的元素。

还要注意lxml中的以下不一致之处:

当您从string读取XML内容时,即:root = et.XML(some_text_variable)结果是root element,而不是文档树。

现在您可以调用len(root)

© www.soinside.com 2019 - 2024. All rights reserved.