我正在尝试删除Python函数返回的XML文件中的一些空text
标记,但出现此错误:TypeError: object of type 'lxml.etree._ElementTree' has no len()
。为什么?
这是功能:
def due(pdfpath):
ntree = uniform_cm(pdfpath)
etree.strip_tags(ntree, 'textline')
# Search for all text "textbox" elements
for textbox in ntree.xpath('//textbox'):
new_line = etree.Element("new_line")
previous_bb = None
# From a given textbox element, iterate over all the "text" elements
for x in textbox.iter("text"):
# Get current bb valu
bb = getBBoxFirstValue(x)
# Check current and past values aren't empty
if bb is not None and previous_bb is not None and (bb - previous_bb) > 20:
# Inserte newline into parent tag
x.getparent().insert(x.getparent().index(x), new_line)
# A new "new_line" element is created
new_line = etree.Element("new_line")
# Append current element is new_line tag
new_line.append(x)
# Keep latest non empty BBox 1st value
if bb is not None:
previous_bb = bb
# Add last new_line element if not null
textbox.append(new_line)
tree = ntree
soup = BeautifulSoup(tree, "lxml")
for x in soup.find_all():
if len(x.get_text(strip=True)) == 0:
x.extract()
return tree
len的only情况是:if len(x.get_text(strip=True)) == 0:
但是我检查了type(x)
并得到了bs4.element.Tag
,而在您的错误消息中是'lxml.etree._ElementTree' has no len()
。
显然,您的错误发生在某些other位置。
对未来的建议:当您寻找异常原因时,状态精确地发生在哪一行。StackTrace包含有关此问题的指示。
所以我进行了一些调查,但与您没有任何关系代码示例。
当您使用lxml解析XML文件时,例如:
from lxml import etree as et
tree = et.parse('Input.xml')
tree的类型(整个XML文档)只是lxml.etree._ElementTree。
[当您现在尝试运行:len(tree)
时,您将得到:
TypeError: object of type 'lxml.etree._ElementTree' has no len()
但是当您从以下树中读取root元素时:root = tree.getroot()
,root的类型是lxml.etree._Element(请注意,现在Element而不是整个文档),则可以运行len(root)
,获取其(直接)子代的数量。其他都一样将其作为XML树的元素。
还要注意lxml中的以下不一致之处:
当您从string读取XML内容时,即:root = et.XML(some_text_variable)
结果是root element,而不是文档树。
现在您可以调用len(root)。