如何向LangChain中的LLM提供有关枚举值的附加信息

Question

我正在尝试使用Python中的LangChain通过LLM（例如GPT-4）提取结构化信息。我的目标是通过将公司与标签相关联来对公司进行分类。

我的输出类的类型为：

from langchain_core.pydantic_v1 import BaseModel

class Company(BaseModel):
    industry: list[Industry]
    customer: list[Customer]

到目前为止一切顺利。现在的问题是，某些标签可能有些特定，我想向 LLM 传递更多信息，以帮助其在选项之间做出决定。使用

Enum

中的

aenum

，如here所述，我可以添加例如文档字符串到枚举值：

from aenum import Enum

class Industry(Enum):
    _init_ = 'value __doc__'
    it = "Information Technology", "All kinds of computer stuff"
    agriculture = "Agriculture", "Farming, irrigation, fertilizers etc."

class Customer(Enum):
    _init_ = 'value __doc__'
    B2C = "B2C", "Companies selling directly to consumers"
    B2B = "B2B", "Companies selling to other businesses"

现在我有了自己的价值观和一些有用的解释，但是，没有直接的方法将这些传递给法学硕士。

如果我使用

.with_structured_output()

或

PydanticOutputParser

他们无法传递来自枚举成员的文档字符串：

from langchain_core.output_parsers import PydanticOutputParser

parser = PydanticOutputParser(pydantic_object=Company)

parser.get_format_instructions()
# 'The output should be formatted as a JSON instance that conforms to the JSON schema below.
# As an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}
# the object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.
# Here is the output schema:
# ```
# {"properties": {"industry": {"type": "array", "items": {"$ref": "#/definitions/Industry"}}, "customer": {"type": "array", "items": {"$ref": "#/definitions/Customer"}}}, "required": ["industry", "customer"], "definitions": {"Industry": {"title": "Industry", "description": "An enumeration.", "enum": ["Information Technology", "Agriculture"]}, "Customer": {"title": "Customer", "description": "An enumeration.", "enum": ["B2C", "B2B"]}}}
#```'

作为一种解决方法，我当然可以编写一个自定义提示来明确详细说明文档字符串，但只是好奇是否有人找到了更直接的方法来做到这一点。

Answer 1

您可以向

__get_pydantic_json_schema__

方法添加一个字段（并使用 pydantic v2），如下所示：

from aenum import Enum
from pydantic.json_schema import GetJsonSchemaHandler
from pydantic_core import core_schema

class EnumSchemaDoc(Enum):
    @classmethod
    def __get_pydantic_json_schema__(cls, core_schema: core_schema.CoreSchema, handler: GetJsonSchemaHandler):
        schema = handler(core_schema)
        schema['documentation'] = {e.value : e.__doc__ for e in cls}
        return schema

class Industry(EnumSchemaDoc):
    _init_ = 'value __doc__'
    it = "Information Technology", "All kinds of computer stuff"
    agriculture = "Agriculture", "Farming, irrigation, fertilizers etc."

class Customer(EnumSchemaDoc):
    _init_ = 'value __doc__'
    B2C = "B2C", "Companies selling directly to consumers"
    B2B = "B2B", "Companies selling to other businesses"


from langchain_core.output_parsers import PydanticOutputParser
from pydantic import BaseModel

class Company(BaseModel):
    industry: Industry
    customer: Customer

parser = PydanticOutputParser(pydantic_object=Company)

parser.get_format_instructions()

'The output should be formatted as a JSON instance that conforms to the JSON schema below.\n\nAs an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}\nthe object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.\n\nHere is the output schema:\n```\n{"$defs": {"Customer": {"documentation": {"B2B": "Companies selling to other businesses", "B2C": "Companies selling directly to consumers"}, "enum": ["B2C", "B2B"], "title": "Customer", "type": "string"}, "Industry": {"documentation": {"Agriculture": "Farming, irrigation, fertilizers etc.", "Information Technology": "All kinds of computer stuff"}, "enum": ["Information Technology", "Agriculture"], "title": "Industry", "type": "string"}}, "properties": {"industry": {"$ref": "#/$defs/Industry"}, "customer": {"$ref": "#/$defs/Customer"}}, "required": ["industry", "customer"]}\n```'

如何向LangChain中的LLM提供有关枚举值的附加信息

问题描述投票：0回答：1

1个回答

最新问题

如何向LangChain中的LLM提供有关枚举值的附加信息

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1