因此,我在 Unity 中有一个使用 Microsoft Azure 的语音识别应用程序,可以在单击按钮时调用语音识别。据我所知,您需要一些东西来触发语音识别,无论是单击按钮/按下 RecognizeOnceAsync() 的按键,还是在 StartContinouslyRecognitionAsync() 的情况下,需要一个用于停止语音识别并开始处理的关键字。我的问题是,是否可以只对着麦克风说话,然后发送语音数据进行分析,尽可能接近自然对话?或者,是否可以在设定的时间内激活语音识别,十秒后停止录音,收到响应,然后再次开始收听?
await speechRecognizer.StartContinuousRecognitionAsync().ConfigureAwait(false); speechRecognizer.Recognized += (s, e) =>
{
var result = e.Result; Debug.Log(result);
}
single shot version: var result = await recognizer.RecognizeOnceAsync().ConfigureAwait(false);
string newMessage = string.Empty;
if (result.Reason == ResultReason.RecognizedSpeech)
{
Debug.Log(result.Text);
}
现在我有一个等待关键字并发送数据的工作版本,以及一个通过单击按钮发送语音数据的版本。我想知道是否有可能只对着麦克风说话并接收 TTS 响应,尽可能接近真实的对话。
是的,可以使用 Azure 认知服务创建更自然的对话体验,系统会持续侦听语音输入并对其进行处理。
以下代码用于使用认知服务进行连续语音识别和合成。
var speechConfig = SpeechConfig.FromSubscription(speechKey, speechRegion);
using var recognizer = new SpeechRecognizer(speechConfig);
StringBuilder recognizedTextBuffer = new StringBuilder();
// Subscribe to events
recognizer.Recognizing += (s, e) =>
{
Console.WriteLine($"Recognizing: {e.Result.Text}");
};
recognizer.Recognized += async (s, e) =>
{
if (e.Result.Reason == ResultReason.RecognizedSpeech)
{
recognizedTextBuffer.Append(e.Result.Text + " ");
// Example action: Check for a specific keyword
if (e.Result.Text.ToLower().Contains("stop"))
{
Console.WriteLine("Stopping recognition...");
await recognizer.StopContinuousRecognitionAsync();
}
else
{
// Synthesize the recognized text
await SynthesizeSpeechAsync(e.Result.Text);
}
}
else if (e.Result.Reason == ResultReason.NoMatch)
{
Console.WriteLine("No speech could be recognized.");
}
};
recognizer.SessionStopped += (s, e) =>
{
Console.WriteLine("Session stopped.");
Console.WriteLine("Press any key to exit...");
Console.ReadKey();
};
// Start continuous recognition
await recognizer.StartContinuousRecognitionAsync();
Console.WriteLine("Say something... Say 'stop' to end the recognition.");
Console.ReadKey();
// Optionally stop recognition when exiting the application
await recognizer.StopContinuousRecognitionAsync();
}
static async Task SynthesizeSpeechAsync(string text)
{
var speechConfig = SpeechConfig.FromSubscription(speechKey, speechRegion);
speechConfig.SpeechSynthesisVoiceName = "en-US-AvaMultilingualNeural";
using var synthesizer = new SpeechSynthesizer(speechConfig);
var result = await synthesizer.SpeakTextAsync(text);
switch (result.Reason)
{
case ResultReason.SynthesizingAudioCompleted:
Console.WriteLine($"Speech synthesized for text: [{text}]");
break;
case ResultReason.Canceled:
var cancellation = SpeechSynthesisCancellationDetails.FromResult(result);
![enter image description here](https://i.imgur.com/NHkjzTI.png) Console.WriteLine($"Synthesis canceled: Reason={cancellation.Reason}");
if (cancellation.Reason == CancellationReason.Error)
{
Console.WriteLine($"ErrorCode={cancellation.ErrorCode}");
Console.WriteLine($"ErrorDetails={cancellation.ErrorDetails}");
}
break;
}
}
输出: