如何获取 Azure 文本中的时间戳到语音合成音频

Question

我正在尝试使用 Azure Text to Speech 获取生成的音频的时间戳。我已正确配置语音配置，但在响应对象中找不到与时间戳相关的任何属性。以下代码是我的代码。

speech_config = speechsdk.SpeechConfig(subscription=speech_key, region=service_region)

speech_config.speech_synthesis_voice_name = "en-US-AndrewMultilingualNeural"
speech_config.request_word_level_timestamps()

text = "Hi"

# use the default speaker as audio output.
speech_synthesizer = speechsdk.SpeechSynthesizer(speech_config=speech_config)

result = speech_synthesizer.speak_text_async(text).get()
print(result.properties)

Answer 1

要获取 Azure Text to Speech 合成音频中的时间戳，您需要将 SSML（语音合成标记语言）与 wordBoundary 或 SentenceBoundary 标记结合使用，并启用韵律功能。

将SSML与wordBoundary或sentenceBoundary一起使用：

Azure Text to Speech 支持添加 SSML 标签来捕获合成语音中的单词或句子边界。这些标签为每个单词或句子生成时间戳。在 API 请求中启用时间戳：

您需要在合成请求中设置 IncludeWordBoundary 或 IncludeSentenceBoundary。

如何获取 Azure 文本中的时间戳到语音合成音频

问题描述投票：0回答：1

1个回答

最新问题

如何获取 Azure 文本中的时间戳到语音合成音频

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1