我正在使用 Python 中的 llama 解析一些 PDF 文件,代码如下:
import os
import pandas as pd
import nest_asyncio
nest_asyncio.apply()
os.environ["LLMA_CLOUD_API_KEY"] = "some_key_id"
key_input = "some_key_id"
from llama_parse import LlamaParse
# running llama parsing
doc_parsed = LlamaParse(result_type="markdown",api_key=key_input
).load_data(r"Path\myfile.pdf")
当我现在运行相同的代码时,解析相同文档的结果是不同的。区别在于表格文本中分隔的 |
和行分隔。
有没有办法在 llama 中获得相同的旧结果或修复一些参数,以便它在相同的模型或相同的方式上工作,始终一次又一次地获得相同的一致结果,以便我可以基于相同的代码逻辑在此基础上构建分析?上个月的美洲驼结果:
print(doc_parsed[5].text[:1000])
# Information
|Name|: Mr. XXX|
|---|---|
|Age/Sex|: XX YRS/M|
|Lab Id.|: 0124080X|
|Refered By|: Self|
|Sample Collection On|: 03/Aug/2024 08:30AM|
|Collected By|: XXX|
|Sample Lab Rec. On|: 03/Aug/2024 11:50 AM|
|Collection Mode|: HOME COLLECTION|
|Reporting On|: 03/Aug/2024 02:48 PM|
|BarCode|: XXX|
# Test Results
|Test Name|Result|Biological Ref. Int.|Unit|
|---|---|---|---|
Llama 现在在同一个 PDF 上结果:print(doc_parsed[5].text[:1000])
# Report
Name: Mr. XXX
Age/Sex: XXX YRS/M
Lab Id: 0124080X
Referred By: Self
Sample Collection On: 03/Aug/2024 08:30 AM
Collected By: XXX
Sample Lab Rec. On: 03/Aug/2024 11:50 AM
Collection Mode: HOME COLLECTION
Reporting On: 03/Aug/2024 02:48 PM
BarCode: XXX
# Test Results
Test Name
Result
Biological Ref. Int.
Unit
期望的结果:# Above part doesn't matter but Test Results should be separated by |
# Test Results
|Test Name|Result|Biological Ref. Int.|Unit|
后面的型号是否有变化导致差异?我可以修复模型以获得一致的结果吗?
通过添加
创建由
|
分隔的表格数据,提供了某种形式的帮助来创建所需的结果,但我不确定使用instructions
结果是否会随着时间的推移保持一致.
Other Answers are also welcome and I am open for better Answers.
# parsing instruction
parsingInstruction2 = """The provided document is a Report.
It should contain tables.
Try to reconstruct the table data into four columns each seperated by |."""
# parse function
doc_parsed_13Sep2 = LlamaParse(result_type="markdown",api_key=key_input,
parsing_instruction=parsingInstruction2
).load_data(r"Path\myfile.pdf")
输出:
# Report
table {
width: 100%;
border-collapse: collapse;
}
th, td {
border: 1px solid black;
padding: 8px;
text-align: left;
}
th {
background-color: #f2f2f2;
}
Name: Mr. XXX
Age/Sex: XXX YRS/M
Lab Id: 0124080X
Referred By: Self
Sample Collection On: 03/Aug/2024 08:30AM
Collected By: XXX
Sample Lab Rec. On: 03/Aug/2024 11:50 AM
Collection Mode: HOME COLLECTION
Reporting On: 03/Aug/2024 03:24 PM
BarCode: XXX
# Test Results
|Test Name|Result|Biological Ref. Int.|Unit|
|---|---|---|---|
|BLOOD UREA|31.80|12-43|mg/dL|
|BLOOD UREA NITROGEN (BUN)|15|6 - 21|mg/dl|
|SERUM CREATININE|1.10|0.9 - 1.3|mg/dL|
|SERUM URIC ACID|5.8|3.5-7.2|mg/dL|
|UREA / CREATININE RATIO|28.91|23 - 33|Ratio|
|BUN / CREATININE RATIO|13.51|5.5 - 19.2|Ratio|
|INORGANIC PHOSPHORUS|3.63|2.5-4.5|mg/dL|
更新- 更新说明
fields separated by |
parsingInstruction3 = """The provided document is a Report.
It should contain tables.
Try to reconstruct the data with fields seperated by |."""
输出:# TEST REPORT
|Name|Mr. XXX|
|---|---|
|Age/Sex|XXX YRS/M|
|Lab Id.|0124080X|
|Referred By|Self|
|Sample Collection On|03/Aug/2024 08:30 AM|
|Collected By|XXX|
|Sample Lab Rec. On|03/Aug/2024 11:50 AM|
|Collection Mode|HOME COLLECTION|
|Reporting On|03/Aug/2024 03:24 PM|
|BarCode|XXX|
# Test Results
|Test Name|Result|Biological Ref. Int.|Unit|
|---|---|---|---|
|BLOOD UREA|31.80|12-43|mg/dL|
|BLOOD UREA NITROGEN (BUN)|15|6 - 21|mg/dL|
|SERUM CREATININE|1.10|0.9 - 1.3|mg/dL|
|SERUM URIC ACID|5.8|3.5-7.2|mg/dL|
|UREA / CREATININE RATIO|28.91|23 - 33|Ratio|
|BUN / CREATININE RATIO|13.51|5.5 - 19.2|Ratio|
|INORGANIC PHOSPHORUS|3.63|2.5-4.5|mg/dL|