我想在langchain中使用tavilly实现一个搜索引擎链。该链将用户的查询作为输入并返回最多 5 个相关文档。每个检索到的文档必须具有文档内容作为 page_content 以及对应站点的 url 作为 LangChain Documents 定义下的元数据。我必须使用 langchain_core.documents.base.Document 类来定义文档。所以这个链条将有两个主要部分:
我写了这段代码,但我不知道如何将tavily输出格式更改为标准格式的文档:
from langchain_core.documents.base import Document
from langchain_community.tools.tavily_search import TavilySearchResults
search = TavilySearchResults(max_results=5)
class ParsedDocument(BaseModel):
content: str = Field(description="This refers to the content of the search.")
url: str = Field(description="This refers to the url of the search.")
search_parser = PydanticOutputParser(pydantic_object=ParsedDocument)
search_engine_chain = search | search_parser
如果您能帮助我如何更改此代码,我将不胜感激。
我终于找到答案了:
class ParsedDocument(BaseModel):
content: str = Field(description="This refers to the content of the search.")
url: str = Field(description="This refers to the url of the search.")
# Define a custom parser
def custom_parser(search_results):
parsed_documents = []
for result in search_results: # Adjust this line based on the actual structure of search_results
parsed_document = ParsedDocument(content=result['content'], url=result['url'])
document = Document(page_content=parsed_document.content, metadata={'url': parsed_document.url})
parsed_documents.append(document)
return parsed_documents
search_engine_chain = search | custom_parser