我正在执行这段代码
from unstructured.documents.html import HTMLDocument
# Load your HTML file
html_file_path = 'UBER_2019.html'
doc = HTMLDocument.from_file(html_file_path)
# Extract text
text = doc.text
我收到一个错误,即
ModuleNotFoundError Traceback (most recent call last)
Cell In[3], line 1
----> 1 from unstructured.documents.html import HTMLDocument
3 # Load your HTML file
4 html_file_path = 'UBER_2019.html'
ModuleNotFoundError: No module named 'unstructured.documents.html'
那么我可以做什么来解决这个问题
您需要安装非结构化模块。
pip install unstructured
https://pypi.org/project/unstructed/
然后尝试:
from unstructured.documents.html import HTMLDocument