Azure TTS 音频失真

问题描述 投票:0回答:1

在Javascript中,我尝试将“microsoft-cognitiveservices-speech-sdk”SpeechSynthesizer以Audio16Khz32KBitRateMonoMp3格式创建的音频通过express传输到react前端应用程序。前几句话听起来还不错,但后面的话就很扭曲了。

这是发送音频的代码:

    synthesizer.synthesizing = function (s, e) {
    currentAudioChunk = {
      audio: Buffer.from(e.result.audioData),
      offset: e.result.audioDuration / 10000, // Convert to milliseconds
    };

    sendEvent("audioData", {
      audio: currentAudioChunk.audio.toString("base64"),

      //his audioOffset data is null, and I'm sending it as a placeholder for now
      audioOffset: "0",
    });

    currentAudioChunk = null;
  };

当音频良好时,发送的字符串如下所示:

"//NIxCElU/5IAY+IAYJ/2sMoSxHk0RH9BVzA0GTFmDQIAQ4Xv8iBn5oEqAUwtPDlBnyDh9f+ukXFoLdRAhSgoYRmQwQAIIJ0DVH/+eez/FwFkR+DcgYghGI/ImQQ0kU/////LRuTiDFw8fLR mk9Bv////v/+hTTN0Cuzk4TZ8qGCFf6UeMhhgB8BjVMd/t5V//NIxA4g2tqkAc9YAD7MqNv/AVISORkvL3sJA7Fje5e+5N7r/ZNxXO9991vl98n9A+bqGiY3myJPJAOEDjHGJfRMMWtAs fGoO9ejd0GhoOy32zfEvfHvZL4vfvTvfV5xO2TeymMriv6/ /j+v5/j2MNzc+w4oEElx7v6u/2t/UcW0Tg+oDGEGSAs30kiaCvEsa//NIxA0f6tbFtGsQsB/wSwiDZ+tZaEobTf5rXCqf/XqycoAzWXtvSKlDpriVZX/mmfM+YFhrAo1ookTMDgiGJRKHii3 7Q6Eh+fFkyMUXF4c2/fTr+0rpvsyXQevBDueX8PPz/PPX88VXXxySFgZQJQqJRW71f/Ss+4XRpRipAO1XGSKjcylABX/VcJmGEBBG"

但是当失真开始时,会出现很多重复的字母,看起来像是垃圾,如下所示:

"//NIxHwAAANIAAAAAFVVVVVVVVVMQU1FMy4xMDBVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVV VVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVV//NIxHwAAANIAAAAAFVVVVVVVVVMQU1FMy4xMDBVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVV VVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVV//// NIxHwAAANIAAAAAFVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVV VVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVV”

重复的数字位于从 Azure 接收的原始数据中:它不是转换为字符串的产物。

如何从 Azure TTS 获取干净的音频?

我尝试删除 V,但这完全损坏了数据。

javascript azure text-to-speech
1个回答
0
投票

WAV 将是默认输入格式。如果您的输入音频是压缩的(MP3 或 OPUS 等格式),您需要将其转换为 WAV 格式并解码音频缓冲区。

根据此 DOC,JavaScript 语音 SDK 接受采样率为 16 kHz 或 8 kHz、16 位深度和单声道 PCM 的 WAV 文件。 .

将您的解码文件转换为WAV,您可以直接在音频流中使用WAV。

对于文本转语音,请使用简单文本,并使用此 doc 或在线转换器将 Base64 编码的字符串转换为简单文本。

let  base64String = "SGVsbG8sIHdvcmxkIQ==";

let  decodedString = b64_to_utf8(base64String);

console.log("Decoded String:", decodedString);


输出:
enter image description here

使用 Azure 文本转语音中的解码字符串。下面的代码将文本转换为 wav 格式的语音,并使用 Azure AI Speech 将音频保存到文件中。

请参阅此doc,了解有关在 JavaScript 中使用 Azure AI 服务进行文本转语音的信息。

WAV (function() {
    "use strict";
    
    const sdk = require("microsoft-cognitiveservices-speech-sdk");
    const readline = require("readline");

    const audioFile = "OutputAudio.wav";
    const speechConfig = sdk.SpeechConfig.fromSubscription(process.env.SPEECH_KEY, process.env.SPEECH_REGION);
    const audioConfig = sdk.AudioConfig.fromAudioFileOutput(audioFile);
    speechConfig.speechSynthesisVoiceName = "en-US-AvaMultilingualNeural"; 

    const synthesizer = new sdk.SpeechSynthesizer(speechConfig, audioConfig);

    const rl = readline.createInterface({
        input: process.stdin,
        output: process.stdout
    });

    rl.question("Enter some text that you want to convert to speech:\n> ", function(text) {
        rl.close();

        synthesizer.speakTextAsync(text,
            function(result) {
                if (result.reason === sdk.ResultReason.SynthesizingAudioCompleted) {
                    console.log("Text-to-speech synthesis complete. Audio saved to:", audioFile);
                } else {
                    console.error("Error synthesizing speech:", result.errorDetails);
                }
                synthesizer.close();
            },
            function(err) {
                console.trace("Error:", err);
                synthesizer.close();
            });

        console.log("Now synthesizing to:", audioFile);
    });
}());


输出:

enter image description here

© www.soinside.com 2019 - 2024. All rights reserved.