如何获取除“工资单”和“支票”之外的亚马逊文本 (cal_text_lending) 的分类输出的完整列表?有办法吗?

问题描述 投票:0回答:1

我试图弄清楚除了这些之外我可能会得到哪些其他类型......当我使用

from textractcaller import call_textract_lending

import boto3
from textractcaller import call_textract_lending
import sagemaker
import os

document = 'lending_package.pdf'

# variables
data_bucket = sagemaker.Session().default_bucket()
region = boto3.session.Session().region_name
account_id = boto3.client('sts').get_caller_identity().get('Account')

os.environ["BUCKET"] = data_bucket
os.environ["REGION"] = region
role = sagemaker.get_execution_role()

print(f"SageMaker role is: {role}\nDefault SageMaker Bucket: s3://{data_bucket}")

s3=boto3.client('s3')
textract = boto3.client('textract', region_name=region)

# Upload images to S3 bucket:
bucket = sagemaker.Session().default_bucket()
print(f"SageMaker Bucket: s3://{data_bucket}")

!aws s3 cp docs/lending_package.pdf s3://{data_bucket}/idp/textract/ --only-show-errors

input_file = 's3://' + bucket + '/idp/textract' + '/' + document
print(f"Lending Package uploaded to S3: {input_file}")

# Process document
textract_json = call_textract_lending(input_document=input_file, boto3_textract_client=textract)

# Print results
results = textract_json['Results']
    
for page in results:
    print("Page Number: {}".format(page["Page"]), "Page Classification: {}".format(page["PageClassification"]["PageType"]))

这是我使用的代码和我得到的结果

我已经检查了 call_texttract_lending 的代码 https://github.com/aws-samples/amazon-texttract-textractor/blob/master/caller/textractcaller/t_call.py#L19 和 aws 文档 https://docs.aws.amazon.com/texttract/latest/dg/API_Prediction.html 并发现没有可能的选项列表。 有办法找到这个吗?

amazon-textract amazon textract
1个回答
0
投票

好的,所以,我发现文档类型列表可以在 https://docs.aws.amazon.com/texttract/latest/dg/lending-response-objects.html 在文档类型下,您也可以下载它

© www.soinside.com 2019 - 2024. All rights reserved.