如何使用Llamaindex检索llm给出的相关答案文档?

问题描述 投票:0回答:1

我们正在使用 pdf 加载到我们的矢量数据库中,我可以从 llm 获得答案,但我们要求我们需要从 pdf 中获取页面内容。

from llama_index.llms.azure_openai import AzureOpenAI
from llama_index.embeddings.azure_openai import AzureOpenAIEmbedding
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.readers.file import PyMuPDFReader
from llama_index.core import Document
from llama_index.core.node_parser import SentenceSplitter
from pathlib import Path
from llama_index.core import Settings
import os
import json
from pathlib import Path
from langchain.chains import RetrievalQA
from langchain.prompts import PromptTemplate

query = 'What is the topic of the document?'  # Hardcoded query
apiKey = 'xxx'  # Hardcoded Azure OpenAI API key
apiVersion = '2023-07-01'  # Set the appropriate API version
azure_endpoint = 'xxx'  # Your Azure OpenAI endpoint

llm = AzureOpenAI(
    model="xxx",
    deployment_name="xxx",
    api_key=apiKey,
    azure_endpoint=azure_endpoint,
    api_version=apiVersion,
)

embed_model = AzureOpenAIEmbedding(
    model="text-embedding-ada-002",
    deployment_name="Embedding",
    api_key=apiKey,
    azure_endpoint=azure_endpoint,
    api_version=apiVersion,
)
# Define the LLM

Settings.llm = llm
Settings.embed_model = embed_model


# Load documents using PyMuPDFReader
docs0 = PyMuPDFReader().load(file_path=Path("sample2.pdf"))  # Replace with the correct path if downloading is needed
doc_text = "\n\n".join([d.get_content() for d in docs0])
docs = [Document(text=doc_text)]

# Split documents into chunks
node_parser = SentenceSplitter(chunk_size=1024)
base_nodes = node_parser.get_nodes_from_documents(docs)

# Create the embedding function using Azure OpenAI

# Create the vector store index
index = VectorStoreIndex(base_nodes, embed_model)
retriever = index.as_retriever(search_type="similarity", search_kwargs={"k": 2})

# Build prompt
template = """Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer. Use three sentences maximum. Keep the answer as concise as possible. You always HAVE TO say "thanks for asking!" at the end of the answer! 
{context}
Question: {question}
Helpful Answer:"""

QA_CHAIN_PROMPT = PromptTemplate.from_template(template)

# # Create a query engine
query_engine = index.as_query_engine()

# Get the answer
result = query_engine.query(query)

print(result)

我们需要这样的结果:

{
    "answer": " Parkinson's disease",
    "context": [
        {
            "name": "qna1718774871.255651\\10.1038_s41531-018-0058-0.pdf",
            "page": 6,
            "pageContent": "ADDITIONAL INFORMATION\nSupplementary information accompanies the paper on the npj Parkinson ’s..."
        },
    ],
    "status": [
        {
            "paper_id": "10.1038/s41531-018-0058-0",
            "status": "Success"
        }
    ]
}

我正在使用Python,所以想知道是否有任何库或任何东西可以用来获得这种结果。

python llama-index
1个回答
0
投票

我们通过简单地进行即时工程来实现这一目标。 https://docs.llamaindex.ai/en/stable/examples/prompts/prompts_rag/

最新问题
© www.soinside.com 2019 - 2024. All rights reserved.