从 LLM 输出了解 pydantic 输出解析器

Question

我在理解输出解析器（尤其是（pydanticoutputparser））以及有哪些功能方面遇到问题。我从我的 llm 进行查询以使用此 pydantic 对象解析一些数据：


class Diploma(BaseModel):
    diploma_name: Optional[str] = Field(None, description="diploma name")
    institution: Optional[str] = Field(None, description="institution")
    city: Optional[str] = Field(None, description="City of the institution.")
    issue_date: Optional[str] = Field(None, description="Issue date of the diploma")
    relevant_coursework: Optional[List[str]] = Field(None, description="relevant courseworks")

class Education(BaseModel):
    diplomas: List[Diploma] = Field(default_factory=list, description="List of diplomas.")

然后我用一个查询提示我的 llm，要求数据采用这个精确的结构，然后我以 json 模式从本地 llm 输出它：

{
    "diplomas": [
        {
            "diploma_name": "",
            "institution": "",
            "city": "",
            "issue_date": "",
            "relevant_coursework": [
                ""
            ]
        }
    ]
}

到目前为止，我的代码 10 次中可以运行 6 次。每个文凭都可以完美地检索并放入教育中。

这是我的问题和我的问题：十有八九它不起作用，因为 llm 的输出结构不正确，因此它没有存储在类对象中。我认为通过使用输出解析器，数据将由 llm 以固定的方式构建。就像规则一样，规则是不能打破的，但我一定是在什么地方误解了。有人可以解释一下 pydantic 是如何工作的，以及是否有任何解决方案可以让 llm 输出几乎 10 次工作 10 次？

另外，我很抱歉没有包含很多代码，我正在一家初创公司实习，不想泄露太多代码，这就是为什么我试图理解这些概念而不是纠正代码。

非常感谢您的帮助！

我阅读了大量有关 langchain pydanticoutputparser 和 jsonoutputparser 的文档。这个问题似乎来自 llm 输出本身。

Answer 1

我认为人们对 Pydantic 模型的目的存在误解。它们主要用于“验证”，确保 JSON 对象符合模型中描述的字段。然而，产生结构化输出的任务完全取决于 LLM 模型。 LLM 的一个常见问题是他们可能会产生幻觉，在 JSON 输出的前缀或后缀中添加无关的文本。为了解决这个问题，您可以实现自定义解析器来清理输出，或者使用

from_json

中提供的

pydantic_core

方法。

下面是一个这样的实现。

from typing import Any from pydantic_core import from_json def extract_and_parse_json(input_string: str) -> Any: # Find the position of the first '{' json_start = input_string.find('{') if json_start == -1: raise ValueError("No JSON dictionary found in the input string.") # Extract the JSON content starting from the first '{' json_content = input_string[json_start:] try: # Convert JSON content to bytes and parse it using from_json json_bytes = json_content.encode("utf-8") parsed_data = from_json(json_bytes,allow_partial=True) return parsed_data except Exception as e: raise ValueError(f"Error parsing JSON content: {e}")

从 LLM 输出了解 pydantic 输出解析器

问题描述投票：0回答：1

1个回答

最新问题

从 LLM 输出了解 pydantic 输出解析器

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1