我正在使用 LayoutLM 读取收据并从发票中获取文本。我正在使用 HuggingFace“philschmid/lilt-en-funsd”中的模型。下面给出的是代码片段:
def run_inference(image_path, model=model, processor=processor, output_image=True):
# Load image from the path
image = Image.open(image_path).convert("RGB")
# get predictions
encoding = processor(image, return_tensors="pt")
del encoding["pixel_values"]
outputs = model(**encoding)
predictions = outputs.logits.argmax(-1).squeeze().tolist()
labels = [model.config.id2label[prediction] for prediction in predictions]
boxes = encoding["bbox"][0].tolist()
model_name = model.name_or_path.split('/')[-1]
if output_image:
image_with_boxes = draw_boxes(image, encoding["bbox"][0], labels)
b_answer_boxes = [encoding["bbox"][0][i].detach().numpy() for i, label in enumerate(labels) if label == "B-ANSWER"]
b_answer_texts = extract_text_from_boxes(image, b_answer_boxes, image_path, model_name)
return draw_boxes(image, encoding["bbox"][0], labels), b_answer_texts
else:
return draw_boxes(image, encoding["bbox"][0], labels), []
问题是,它确实正确提取了“B-ANSWER”标签,但它们被分成多个框,如下图所示:
我只想从收据中提取商品、数量和价格。任何有关此问题的帮助将不胜感激,谢谢!
参见https://github.com/microsoft/unilm/issues/328
word_list = []
for id in encoding['input_ids'].squeeze().tolist():
word_list.append(processor.decode([id]))
word_list1 = ' '.join([x for x in word_list if x!='[PAD]' and x!='[CLS]' and x!='[SEP]'])