Azure 文本转语音 python SDK 超时错误

问题描述 投票:0回答:1

我有一个最小的天蓝色文本到语音示例,该示例在某些计算机上失败,而在其他计算机上则失败。所有计算机都是 MacOS 14.5,运行 python 3.11.8,azure-cognitiveservices-speech==1.41.1 运行代码的计算机之间没有其他差异。

有些计算机可以立即工作并生成音频文件,而其他计算机则从不工作并超时并出现以下错误:

错误详细信息:USP 错误:等待第一个音频块超时 错误:文件:/Users/runner/work/1/s/external/azure-c-shared-utility/pal/ios-osx/tlsio_appleios.c Func:tlsio_appleios_destroy 行:196 tlsio_appleios_destroy 在未处于 TLSIO_STATE_CLOSED 时调用。

github 上有一个未解决的问题,尽管它只引用了我怀疑是次要的 TLS 错误:https://github.com/azure/azure-c-shared-utility/issues/658

def text_to_speech(text, voice_name='zh-CN-YunfengNeural')

    if not os.path.exists(TEMP_AZURE_AUDIO_PATH): os.makedirs(TEMP_AZURE_AUDIO_PATH)
    output_file = os.path.join(TEMP_AZURE_AUDIO_PATH, f"{text[:10]}--{voice_name}--{style}.wav")

    speech_config = speechsdk.SpeechConfig(subscription=credentials.azure_speech_key, region=credentials.azure_service_region)  
    speech_config.speech_synthesis_voice_name = voice_name
    
    audio_config = speechsdk.audio.AudioOutputConfig(filename=output_file)

    synthesizer = speechsdk.SpeechSynthesizer(speech_config=speech_config, audio_config=audio_config)
    result = synthesizer.speak_text_async(text).get()

    # Check result status
    if result.reason == speechsdk.ResultReason.SynthesizingAudioCompleted:
        logger.info("Speech synthesis completed.")
        # Verify that the file was created successfully
        if os.path.exists(output_file):
            print(f"File '{output_file}' was created successfully.")
        else:
            print(f"File '{output_file}' was not created.")
            return None
        
    elif result.reason == speechsdk.ResultReason.Canceled:
        cancellation_details = result.cancellation_details
        print(f"Speech synthesis canceled: {cancellation_details.reason}")
        if cancellation_details.reason == speechsdk.CancellationReason.Error:
            if cancellation_details.error_details:
                print(f"Error details: {cancellation_details.error_details}")
        return None

    return output_file

期望在相同的环境、操作系统、Python 包和相同的凭据下,它可以在每台计算机上运行。 8 台计算机中有 5 台产生错误,其他计算机每次都能工作。

有人有什么建议吗?

azure text-to-speech
1个回答
0
投票

下面是完整的修改代码,以两种方式修复错误。

  • 如果 SDK 方法超时或失败,代码会自动回退到 REST API 进行语音合成,确保 SDK 的网络或 TLS 相关问题不会阻止功能。

  • try- except 块尽早捕获异常,记录问题,并切换到 REST API,确保即使出现网络或 SDK 问题也能顺利执行。

代码:

import os
import logging
import azure.cognitiveservices.speech as speechsdk
import requests

logging.basicConfig(level=logging.DEBUG)

AZURE_SPEECH_KEY = "<speech_key>"
AZURE_SERVICE_REGION = "<speech_region>"
TEMP_AZURE_AUDIO_PATH = "./azure_audio_output"

def text_to_speech(text, voice_name='zh-CN-YunfengNeural'):
    if not os.path.exists(TEMP_AZURE_AUDIO_PATH):
        os.makedirs(TEMP_AZURE_AUDIO_PATH)

    output_file = os.path.join(TEMP_AZURE_AUDIO_PATH, f"{text[:10]}-{voice_name}.wav")
    try:
        speech_config = speechsdk.SpeechConfig(subscription=AZURE_SPEECH_KEY, region=AZURE_SERVICE_REGION)
        speech_config.speech_synthesis_voice_name = voice_name
        audio_config = speechsdk.audio.AudioOutputConfig(filename=output_file)

        synthesizer = speechsdk.SpeechSynthesizer(speech_config=speech_config, audio_config=audio_config)
        result = synthesizer.speak_text_async(text).get()

        if result.reason == speechsdk.ResultReason.SynthesizingAudioCompleted:
            print("Speech synthesis completed.")
            if os.path.exists(output_file):
                print(f"File '{output_file}' created successfully.")
            else:
                print("File not created.")
                return None

        elif result.reason == speechsdk.ResultReason.Canceled:
            print(f"Synthesis canceled: {result.cancellation_details.reason}")
            if result.cancellation_details.error_details:
                print(f"Error details: {result.cancellation_details.error_details}")
            return None
        return output_file

    except Exception as e:
        print(f"Exception occurred: {e}")
        print("Attempting to use REST API fallback...")
        return text_to_speech_rest(text, voice_name)

def text_to_speech_rest(text, voice_name='zh-CN-YunfengNeural'):
    url = f"https://{AZURE_SERVICE_REGION}.tts.speech.microsoft.com/cognitiveservices/v1"
    headers = {
        'Ocp-Apim-Subscription-Key': AZURE_SPEECH_KEY,
        'Content-Type': 'application/ssml+xml',
        'X-Microsoft-OutputFormat': 'riff-24khz-16bit-mono-pcm'
    }
    ssml = f"""
    <speak version='1.0' xml:lang='en-US'>
        <voice xml:lang='zh-CN' name='{voice_name}'>
            {text}
        </voice>
    </speak>"""

    try:
        response = requests.post(url, headers=headers, data=ssml.encode('utf-8'))
        if response.status_code == 200:
            output_file = os.path.join(TEMP_AZURE_AUDIO_PATH, f"{text[:10]}-{voice_name}-rest.wav")
            with open(output_file, "wb") as audio_file:
                audio_file.write(response.content)
            print(f"REST API: File '{output_file}' created successfully.")
            return output_file
        else:
            print(f"REST API Error: {response.status_code}, {response.text}")
            return None
    except Exception as e:
        print(f"REST API exception: {e}")
        return None

if __name__ == "__main__":
    text = "你好, 欢迎使用微软的语音服务。"
    voice_name = "zh-CN-YunfengNeural"
    output = text_to_speech(text, voice_name)

    if output:
        print(f"Audio file generated: {output}")
    else:
        print("Failed to generate audio.")

输出:

以下代码运行成功,并从文本输入中得到语音输出,如下所示。

enter image description here

© www.soinside.com 2019 - 2024. All rights reserved.