使用 Langchain 向 LLM 提供完整的 GitHub 存储库

问题描述 投票:0回答:2

我正在构建一个代码分析器,它将评估整个代码库,并让我深入了解代码库,例如使用的技术、使用的算法以及更多关于编码器的信息。通过使用法学硕士。使用 langchain,我想出了一个解决方案,但它没有获得存储库的完整上下文,就像我有 3 个文件,一个用于 linux shell,一个用于 windows powershell,还有一个自述文件,当我查询它时,我得到如下结果

Based on the code snippet provided, it appears to be a comment block from a software project licensed under the GNU Affero General Public License (AGPL). It does not contain any executable code, so it is not possible to directly assess the programming skills or specific technologies used by the author from this code alone.\n\nHowever, the presence of this comment block indicates that the person writing the code was aware of the importance of open source licensing and the requirements for distributing software under such a license. It also shows that they were able to write clear and concise documentation, which is an essential skill for any software developer.\n\nFurthermore, the code snippet mentions specific terms like "Corresponding Source" and "System Libraries," which suggests that the person has a good understanding of software licensing and the nuances of open source software development. Overall, while the code itself does not provide much insight into the person\'s technical abilities, the comment block indicates a solid foundation in open source principles and documentation skills.\n\nTo gain a more comprehensive understanding of their abilities, it would be necessary to review additional code samples, ask about their experience with specific technologies and programming languages, and assess their problem-solving and communication skills during an interview. 
这是获取许可证文件而不是代码文件。 这是我的代码

from langchain_text_splitters import CharacterTextSplitter
from langchain.vectorstores import FAISS
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_community.llms.huggingface_pipeline import HuggingFacePipeline
from langchain.document_loaders import DirectoryLoader,TextLoader
from langchain_community.llms import HuggingFaceEndpoint
from langchain.chains import LLMChain,SimpleSequentialChain
from langchain.chains import ConversationalRetrievalChain
from langchain_core.prompts import PromptTemplate

import os
AT="hf_qxxxxxxxxxxxxxxxxxxxxxx"
# hf = HuggingFacePipeline.from_model_id(
#     model_id="nomic-ai/gpt4all-j",
#     task="text-generation",
#     pipeline_kwargs={"max_new_tokens": 10},
# )
repo_id = "mistralai/Mistral-7B-Instruct-v0.2"

llm = HuggingFaceEndpoint(repo_id=repo_id, huggingfacehub_api_token="hf_qxxxxxxxxxcbxxxxxxxxx")
# llm_chain = LLMChain(prompt=prompt, llm=llm)
embeddings = HuggingFaceEmbeddings()

# Initialize the vector store
vectorstore = FAISS.from_texts("somen is the author of all this code.",embeddings)

# Define a list of supported file extensions
supported_extensions = ['.py', '.js','.sh', '.java', '.cpp', '.cs', '.go', '.rb', '.php', '.swift', '.kt', '.ts', '.jsx', '.vue']

# Define a list of directories to exclude
excluded_dirs = ['node_modules', '.pub-cache', '__pycache__']

text_splitter = CharacterTextSplitter(separator=" ", chunk_size=1000, chunk_overlap=200)

all_texts = ''
loader = DirectoryLoader("/content/RTCserver",loader_cls=TextLoader,
                             recursive=True, show_progress=True,
                             use_multithreading=True,max_concurrency=8)
# raw_documents = loader.load()
# for root, dirs, files in os.walk("/content/sshedit"):
#     dirs[:] = [d for d in dirs if d not in excluded_dirs]  # Exclude directories
#     for file in files:
#         if any(file.endswith(ext) for ext in supported_extensions):
#             file_path = os.path.join(root, file)
#             with open(file_path, "r") as f:
#                 code = f.read()
#                 all_texts += code


documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1080, chunk_overlap=0)
docs = text_splitter.split_documents(documents)
# Add texts once gathered
vectorstore=FAISS.from_documents(docs, embeddings)

# Initialize the agent and retrieval QA chain
retriever = vectorstore.as_retriever()
qa = ConversationalRetrievalChain.from_llm(llm, retriever=retriever, verbose=True)
chat_history=[]
# Query the agent to extract the developer's skills
query = "<s>by looking this code what can you say about my learnings.</s> [INST] let assume i am a interviewer for a software developer role and i have seen this code written by someone so I have to check this code and tell what are the learning the person have like what the person can actually do.[/INST]"
result = qa({"question": query, "chat_history": chat_history})
chat_history.append((query, result["answer"]))
print(result)
python artificial-intelligence langchain
2个回答
0
投票

听起来您可以使用更新而不是插入,还要检查数据库是否有任何触发器设置为在新插入时删除


0
投票

如果您能分享一些代码,尤其是插入数据的代码,那就太好了。据我所知,可能存在 1 个问题。

您不是插入,而是更新同一行。在这种情况下,您可能会使用类似

UPDATE MESSAGES SET Message = ?,User_id = ? WHERE id = ?

的东西

您应该使用

INSERT INTO MESSAGES SET Message = ?,User_id = ?
。这个答案可能与您的问题无关,但同样,提供的信息是有限的。

© www.soinside.com 2019 - 2024. All rights reserved.