在Javascript中,我尝试将“microsoft-cognitiveservices-speech-sdk”SpeechSynthesizer以Audio16Khz32KBitRateMonoMp3格式创建的音频通过express传输到react前端应用程序。前几句话听起来还不错,但后面的话就很扭曲了。
这是发送音频的代码:
synthesizer.synthesizing = function (s, e) {
currentAudioChunk = {
audio: Buffer.from(e.result.audioData),
offset: e.result.audioDuration / 10000, // Convert to milliseconds
};
sendEvent("audioData", {
audio: currentAudioChunk.audio.toString("base64"),
//his audioOffset data is null, and I'm sending it as a placeholder for now
audioOffset: "0",
});
currentAudioChunk = null;
};
当音频良好时,发送的字符串如下所示:
"//NIxCElU/5IAY+IAYJ/2sMoSxHk0RH9BVzA0GTFmDQIAQ4Xv8iBn5oEqAUwtPDlBnyDh9f+ukXFoLdRAhSgoYRmQwQAIIJ0DVH/+eez/FwFkR+DcgYghGI/ImQQ0kU/////LRuTiDFw8fLR mk9Bv////v/+hTTN0Cuzk4TZ8qGCFf6UeMhhgB8BjVMd/t5V//NIxA4g2tqkAc9YAD7MqNv/AVISORkvL3sJA7Fje5e+5N7r/ZNxXO9991vl98n9A+bqGiY3myJPJAOEDjHGJfRMMWtAs fGoO9ejd0GhoOy32zfEvfHvZL4vfvTvfV5xO2TeymMriv6/ /j+v5/j2MNzc+w4oEElx7v6u/2t/UcW0Tg+oDGEGSAs30kiaCvEsa//NIxA0f6tbFtGsQsB/wSwiDZ+tZaEobTf5rXCqf/XqycoAzWXtvSKlDpriVZX/mmfM+YFhrAo1ookTMDgiGJRKHii3 7Q6Eh+fFkyMUXF4c2/fTr+0rpvsyXQevBDueX8PPz/PPX88VXXxySFgZQJQqJRW71f/Ss+4XRpRipAO1XGSKjcylABX/VcJmGEBBG"
但是当失真开始时,会出现很多重复的字母,看起来像是垃圾,如下所示:
"//NIxHwAAANIAAAAAFVVVVVVVVVMQU1FMy4xMDBVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVV VVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVV//NIxHwAAANIAAAAAFVVVVVVVVVMQU1FMy4xMDBVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVV VVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVV//// NIxHwAAANIAAAAAFVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVV VVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVV”
重复的数字位于从 Azure 接收的原始数据中:它不是转换为字符串的产物。
如何从 Azure TTS 获取干净的音频?
我尝试删除 V,但这完全损坏了数据。
WAV 将是默认输入格式。如果您的输入音频是压缩的(MP3 或 OPUS 等格式),您需要将其转换为 WAV 格式并解码音频缓冲区。
根据此 DOC,JavaScript 语音 SDK 接受采样率为 16 kHz 或 8 kHz、16 位深度和单声道 PCM 的 WAV 文件。 .
将您的解码文件转换为WAV,您可以直接在音频流中使用WAV。
对于文本转语音,请使用简单文本,并使用此 doc 或在线转换器将 Base64 编码的字符串转换为简单文本。
let base64String = "SGVsbG8sIHdvcmxkIQ==";
let decodedString = b64_to_utf8(base64String);
console.log("Decoded String:", decodedString);
输出:
使用 Azure 文本转语音中的解码字符串。下面的代码将文本转换为 wav 格式的语音,并使用 Azure AI Speech 将音频保存到文件中。
请参阅此doc,了解有关在 JavaScript 中使用 Azure AI 服务进行文本转语音的信息。
WAV (function() {
"use strict";
const sdk = require("microsoft-cognitiveservices-speech-sdk");
const readline = require("readline");
const audioFile = "OutputAudio.wav";
const speechConfig = sdk.SpeechConfig.fromSubscription(process.env.SPEECH_KEY, process.env.SPEECH_REGION);
const audioConfig = sdk.AudioConfig.fromAudioFileOutput(audioFile);
speechConfig.speechSynthesisVoiceName = "en-US-AvaMultilingualNeural";
const synthesizer = new sdk.SpeechSynthesizer(speechConfig, audioConfig);
const rl = readline.createInterface({
input: process.stdin,
output: process.stdout
});
rl.question("Enter some text that you want to convert to speech:\n> ", function(text) {
rl.close();
synthesizer.speakTextAsync(text,
function(result) {
if (result.reason === sdk.ResultReason.SynthesizingAudioCompleted) {
console.log("Text-to-speech synthesis complete. Audio saved to:", audioFile);
} else {
console.error("Error synthesizing speech:", result.errorDetails);
}
synthesizer.close();
},
function(err) {
console.trace("Error:", err);
synthesizer.close();
});
console.log("Now synthesizing to:", audioFile);
});
}());
输出: