从 LayoutLM 获取全文

问题描述 投票:0回答:1

我正在使用 LayoutLM 读取收据并从发票中获取文本。我正在使用 HuggingFace“philschmid/lilt-en-funsd”中的模型。下面给出的是代码片段:

def run_inference(image_path, model=model, processor=processor, output_image=True):
    # Load image from the path
    image = Image.open(image_path).convert("RGB")

    # get predictions
    encoding = processor(image, return_tensors="pt")
    del encoding["pixel_values"]
    outputs = model(**encoding)
    predictions = outputs.logits.argmax(-1).squeeze().tolist()
    labels = [model.config.id2label[prediction] for prediction in predictions]
    boxes = encoding["bbox"][0].tolist()
    model_name = model.name_or_path.split('/')[-1]

    if output_image:
        image_with_boxes = draw_boxes(image, encoding["bbox"][0], labels)
        b_answer_boxes = [encoding["bbox"][0][i].detach().numpy() for i, label in enumerate(labels) if label == "B-ANSWER"]
        b_answer_texts = extract_text_from_boxes(image, b_answer_boxes, image_path, model_name)
        return draw_boxes(image, encoding["bbox"][0], labels), b_answer_texts
    else:
        return draw_boxes(image, encoding["bbox"][0], labels), []

问题是,它确实正确提取了“B-ANSWER”标签,但它们被分成多个框,如下图所示:

enter image description here

我只想从收据中提取商品、数量和价格。任何有关此问题的帮助将不胜感激,谢谢!

python deep-learning ocr huggingface-transformers text-extraction
1个回答
0
投票

参见https://github.com/microsoft/unilm/issues/328

word_list = []
for id in encoding['input_ids'].squeeze().tolist():
    word_list.append(processor.decode([id]))
word_list1 = ' '.join([x for x in word_list if x!='[PAD]' and x!='[CLS]' and x!='[SEP]'])
© www.soinside.com 2019 - 2024. All rights reserved.