Azure 通信服务中的语音到文本集成问题

Question

上下文： 我们正在使用 Azure 通信服务 (ACS) 和 Azure 语音服务 构建一个机器人来处理电话。该机器人提出问题（通过 TTS）并使用 语音转文本 (STT) 捕获用户响应。

挑战：

未捕获用户响应。尽管将
```
InitialSilenceTimeout
```
设置为 10 秒，但机器人会在 1-2 秒后跳到下一个问题，而不会识别语音。
即使没有检测到响应，机器人也不会重新提示用户。

需要帮助：

如何确保 ACS 电话通话过程中准确的实时语音转文本捕获？
ACS 中的语音识别有更好的配置或替代方法吗？

其他背景：

遵循官方 ACS C# 示例。
使用 Azure 语音服务和 ACS SDK。

代码片段：

// Recognize user speech
async Task<string> RecognizeSpeechAsync(CallMedia callConnectionMedia, string callerId, ILogger logger)
{
    // Configure recognition options
    var recognizeOptions = new CallMediaRecognizeSpeechOptions(
        targetParticipant: CommunicationIdentifier.FromRawId(callerId))
    {
        InitialSilenceTimeout = TimeSpan.FromSeconds(10), // Wait up to 10 seconds for the user to start speaking
        EndSilenceTimeout = TimeSpan.FromSeconds(5),     // Wait up to 5 seconds of silence before considering the response complete
        OperationContext = "SpeechRecognition"
    };

    try
    {
        // Start speech recognition
        var result = await callConnectionMedia.StartRecognizingAsync(recognizeOptions);

        // Handle recognition success
        if (result is Response<StartRecognizingCallMediaResult>)
        {
            logger.LogInformation($"Result: {result}");
            logger.LogInformation("Recognition started successfully.");
            // Simulate capturing response (replace with actual recognition logic)
            return "User response captured"; // Replace with actual response text from recognition
        }

        logger.LogWarning("Recognition failed or timed out.");
        return string.Empty; // Return empty if recognition fails
    }
    catch (Exception ex)
    {
        logger.LogError($"Error during speech recognition: {ex.Message}");
        return string.Empty;
    }
}

我们做了什么：

创建了 ACS 实例并获取了有效电话号码。
设置事件订阅来处理来电回调。
C# 中的 STT 集成 Azure 语音服务。

成就：

使用 ACS 成功连接呼叫。
播放从 Excel 文件生成的 TTS 提示。

Answer 1

ACS 提供了一组语音识别选项。确保这些配置正确至关重要。

使用
```
InitialSilenceTimeout
```
，您可以设置合理的持续时间（例如 5-10 秒），以便用户有时间做出响应。
使用
```
EndSilenceTimeout
```
，您可以设置适应语音自然停顿的持续时间（例如 3-5 秒）。

var recognizeOptions = new CallMediaRecognizeSpeechOptions(
    targetParticipant: CommunicationIdentifier.FromRawId(callerId))
{
    InitialSilenceTimeout = TimeSpan.FromSeconds(8),
    EndSilenceTimeout = TimeSpan.FromSeconds(5),
    MaxRecognitionDuration = TimeSpan.FromSeconds(30), // If the user response is lengthy
    DtmfToneTimeout = TimeSpan.FromSeconds(10) // For fallback DTMF input
};

在这里，您可以将音频流路由到 Azure Speech SDK，以获得更好的控制和准确性。

使用 ACS
```
Call Media
```
API 路由来自电话呼叫的实时音频流。配置语音 SDK 以实时转录音频流。

代码：

using Microsoft.CognitiveServices.Speech;

var speechConfig = SpeechConfig.FromSubscription("YourSpeechKey", "YourRegion");
speechConfig.SpeechRecognitionLanguage = "en-US"; // Set locale

using var audioConfig = AudioConfig.FromStreamInput(yourAudioStream);
using var recognizer = new SpeechRecognizer(speechConfig, audioConfig);

recognizer.Recognized += (s, e) => {
    if (e.Result.Reason == ResultReason.RecognizedSpeech)
    {
        Console.WriteLine($"Recognized: {e.Result.Text}");
    }
};

await recognizer.StartContinuousRecognitionAsync();

日志：

Azure 通信服务中的语音到文本集成问题

问题描述投票：0回答：1

1个回答

最新问题

Azure 通信服务中的语音到文本集成问题

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1