在本地使用带有语义内核的文本嵌入模型

问题描述 投票:0回答:1

我一直在阅读 Stephen Toub 的 博客文章,内容是关于使用语义内核从头开始构建一个简单的基于控制台的 .NET 聊天应用程序。我正在遵循这些示例,但我想使用 microsoft Phi 3 和 nomic 嵌入模型而不是 OpenAI。 我可以使用语义内核 Huggingface 插件重新创建博客文章中的第一个示例。但我似乎无法运行文本嵌入示例。

我已经下载了 Phi 和 nomic 嵌入文本,并使用 lm studio 在本地服务器上运行它们。

这是我想出的使用 Huggingface 插件的代码:

using System.Net;
using System.Text;
using System.Text.RegularExpressions;
using Microsoft.Extensions.DependencyInjection;
using Microsoft.SemanticKernel;
using Microsoft.SemanticKernel.Embeddings;
using Microsoft.SemanticKernel.Memory;
using System.Numerics.Tensors;
using Microsoft.Extensions.DependencyInjection;
using Microsoft.Extensions.Logging;
using Microsoft.SemanticKernel.ChatCompletion;

#pragma warning disable SKEXP0070, SKEXP0003, SKEXP0001, SKEXP0011, SKEXP0052, SKEXP0055, SKEXP0050  // Type is for evaluation purposes only and is subject to change or removal in future updates. 

internal class Program
{
    private static async Task Main(string[] args)
    {
        //Suppress this diagnostic to proceed.
        // Initialize the Semantic kernel
        IKernelBuilder kernelBuilder = Kernel.CreateBuilder();
        kernelBuilder.Services.ConfigureHttpClientDefaults(c => c.AddStandardResilienceHandler());
        var kernel = kernelBuilder
            .AddHuggingFaceTextEmbeddingGeneration("nomic-ai/nomic-embed-text-v1.5-GGUF/nomic-embed-text-v1.5.Q8_0.gguf",
            new Uri("http://localhost:1234/v1"),
            apiKey: "lm-studio",
            serviceId: null)
            .Build();

        var embeddingGenerator = kernel.GetRequiredService<ITextEmbeddingGenerationService>();
        var memoryBuilder = new MemoryBuilder();
        memoryBuilder.WithTextEmbeddingGeneration(embeddingGenerator);
        memoryBuilder.WithMemoryStore(new VolatileMemoryStore());
        var memory = memoryBuilder.Build();
        // Download a document and create embeddings for it
        string input = "What is an amphibian?";
        string[] examples = [ "What is an amphibian?",
                              "Cos'è un anfibio?",
                              "A frog is an amphibian.",
                              "Frogs, toads, and salamanders are all examples.",
                              "Amphibians are four-limbed and ectothermic vertebrates of the class Amphibia.",
                              "They are four-limbed and ectothermic vertebrates.",
                              "A frog is green.",
                              "A tree is green.",
                              "It's not easy bein' green.",
                              "A dog is a mammal.",
                              "A dog is a man's best friend.",
                              "You ain't never had a friend like me.",
                              "Rachel, Monica, Phoebe, Joey, Chandler, Ross"];
        for (int i = 0; i < examples.Length; i++)
            await memory.SaveInformationAsync("net7perf", examples[i], $"paragraph{i}");
        var embed = await embeddingGenerator.GenerateEmbeddingsAsync([input]);
        ReadOnlyMemory<float> inputEmbedding = (embed)[0];
        // Generate embeddings for each chunk.
        IList<ReadOnlyMemory<float>> embeddings = await embeddingGenerator.GenerateEmbeddingsAsync(examples);
        // Print the cosine similarity between the input and each example
        float[] similarity = embeddings.Select(e => TensorPrimitives.CosineSimilarity(e.Span, inputEmbedding.Span)).ToArray();
        similarity.AsSpan().Sort(examples.AsSpan(), (f1, f2) => f2.CompareTo(f1));
        Console.WriteLine("Similarity Example");
        for (int i = 0; i < similarity.Length; i++)
            Console.WriteLine($"{similarity[i]:F6}   {examples[i]}");
    }
}

在线:

for (int i = 0; i < examples.Length; i++)
    await memory.SaveInformationAsync("net7perf", examples[i], $"paragraph{i}");

我收到以下异常:

JsonException:JSON 值无法转换为 Microsoft.SemanticKernel.Connectors.HuggingFace.Core.TextEmbeddingResponse

有人知道我做错了什么吗?

我已将以下 nuget 包下载到项目中:

身份证 版本 项目名称
Microsoft.SemanticKernel.Core {1.15.0} 本地LLM应用程序
Microsoft.SemanticKernel.Plugins.Memory {1.15.0-alpha} 本地LLM应用程序
Microsoft.Extensions.Http.Resilience {8.6.0} 本地LLM应用程序
Microsoft.Extensions.Logging {8.0.0} 本地LLM应用程序
Microsoft.SemanticKernel.Connectors.HuggingFace {1.15.0-预览版} 本地LLM应用程序
Newtonsoft.Json {13.0.3} 本地LLM应用程序
Microsoft.Extensions.Logging.Console {8.0.0} 本地LLM应用程序
c# large-language-model semantic-kernel semantic-kernel-plugins
1个回答
0
投票

我认为您不能将

AddHuggingFaceTextEmbeddingGeneration
与 LM Studio 中开箱即用的嵌入模型一起使用。 原因是
HuggingFaceClient
内部更改了url并添加了:

管道/特征提取/

 private Uri GetEmbeddingGenerationEndpoint(string modelId)
     => new($"{this.Endpoint}{this.Separator}pipeline/feature-extraction/{modelId}");

这与我在 LM Studio 控制台中收到的错误消息相同:

[2024-07-03 22:18:19.898] [错误] 意外的端点或方法。 (邮政 /v1/embedding/pipeline/feature-extraction/nomic-ai/nomic-embed-text-v1.5-GGUF/nomic-embed-text-v1.5.Q5_K_M.gguf)。 无论如何还是返回200

enter image description here

为了使其正常工作,必须更改网址。

© www.soinside.com 2019 - 2024. All rights reserved.