我正在尝试使用 Fetch API 和 PassThrough 推送流将 Azure TTS 从我的服务器流式传输到客户端。预期结果是分块接收流。实际输出是一个没有信息的响应对象。我尝试使用 Fetch 创建 ReadableStream,但是当我尝试记录响应时,我收到一条错误消息,指出我的响应对象大小为 0。我还尝试查看是否有任何内容以块的形式发送,但所有内容都是大小0.我尝试调试我的后端,但据我所知,它工作正常。如果有人解决了这个问题或者有用 JavaScript 传输 TTS 的演示代码,请告诉我。这是我的实际功能代码。我相信它有效。:
const generateSpeechFromText = async (text) => {
const speechConfig = sdk.SpeechConfig.fromSubscription(
process.env.SPEECH_KEY,
process.env.SPEECH_REGION
);
speechConfig.speechSynthesisVoiceName = "en-US-JennyNeural";
speechConfig.speechSynthesisOutputFormat =
sdk.SpeechSynthesisOutputFormat.Audio16Khz32KBitRateMonoMp3;
const synthesizer = new sdk.SpeechSynthesizer(speechConfig);
return new Promise((resolve, reject) => {
synthesizer.speakTextAsync(
text,
(result) => {
if (result.reason === sdk.ResultReason.SynthesizingAudioCompleted) {
const bufferStream = new PassThrough();
bufferStream.end(Buffer.from(result.audioData));
resolve(bufferStream);
} else {
console.error("Speech synthesis canceled: " + result.errorDetails);
reject(new Error("Speech synthesis failed"));
}
synthesizer.close();
},
(error) => {
console.error("Error in speech synthesis: " + error);
synthesizer.close();
reject(error);
}
);
});
这是我要发送到前端的 index.js 路由代码。我相信它有效,但可能会出现错误。:
app.get("/textToSpeech", async (request, reply) => {
if (textWorks) {
try {
const stream = await generateSpeechFromText(
textWorks
);
console.log("Stream created, sending to client: ", stream);
reply.type("audio/mpeg").send(stream);
} catch (err) {
console.error(err);
reply.status(500).send("Error in text-to-speech synthesis");
}
} else {
reply.status(404).send("OpenAI response not found");
}
});
这是我的前端客户端代码。我认为错误与响应对象有关,但我不确定。:
// Fetch TTS from Backend
export const fetchTTS = async (): Promise<Blob | null> => {
try {
const response = await fetch("http://localhost:3000/textToSpeech", {
method: "GET",
});
// the response is size 0 and has no information
if (!response.ok) {
throw new Error(`HTTP error! status: ${response.status}`);
}
const body = response.body;
console.log("Body", body);
if (!body) {
console.error("Response body is not a readable stream.");
return null;
}
const reader = body.getReader();
let chunks: Uint8Array[] = [];
const read = async () => {
const { done, value } = await reader.read();
if (done) {
return;
}
if (value) {
chunks.push(value);
}
await read();
};
console.log("Chunks", chunks);
await read();
const audioBlob = new Blob(chunks, { type: "audio/mpeg" });
// console.log("Audio Blob: ", audioBlob);
// console.log("Audio Blob Size: ", audioBlob.size);
return audioBlob.size > 0 ? audioBlob : null;
} catch (error) {
console.error("Error fetching text-to-speech audio:", error);
return null;
}
};
我尝试过直接将其读入 blob,之后我尝试使用 FetchApi 创建一个 ReadableStream 对象,此时我意识到响应对象的大小为 0。我已记录控制台并尝试调试服务器端代码并基于控制台日志语句服务器端代码正在按预期工作。它将音频分成块并将块推送到客户端。
下面的简单代码是一个 Express.js 服务器,它使用 Microsoft Azure 认知服务语音 SDK 将文本转换为语音。
服务器端:
const express = require('express');
const sdk = require("microsoft-cognitiveservices-speech-sdk");
const { PassThrough } = require('stream');
const app = express();
const generateSpeechFromText = async (text) => {
const speechConfig = sdk.SpeechConfig.fromSubscription(
process.env.SPEECH_KEY,
process.env.SPEECH_REGION
);
speechConfig.speechSynthesisVoiceName = "en-US-JennyNeural";
speechConfig.speechSynthesisOutputFormat =
sdk.SpeechSynthesisOutputFormat.Audio16Khz32KBitRateMonoMp3;
const synthesizer = new sdk.SpeechSynthesizer(speechConfig);
return new Promise((resolve, reject) => {
synthesizer.speakTextAsync(
text,
(result) => {
if (result.reason === sdk.ResultReason.SynthesizingAudioCompleted) {
const bufferStream = new PassThrough();
bufferStream.end(Buffer.from(result.audioData));
resolve(bufferStream);
} else {
console.error("Speech synthesis canceled: " + result.errorDetails);
reject(new Error("Speech synthesis failed"));
}
synthesizer.close();
},
(error) => {
console.error("Error in speech synthesis: " + error);
synthesizer.close();
reject(error);
}
);
});
};
app.get("/textToSpeech", async (request, reply) => {
const textWorks = "Your desired text goes here"; // Replace with your text
if (textWorks) {
try {
const stream = await generateSpeechFromText(textWorks);
reply.type("audio/mpeg");
stream.pipe(reply); // Send the stream directly without attempting JSON conversion
} catch (err) {
console.error(err);
reply.status(500).send("Error in text-to-speech synthesis");
}
} else {
reply.status(404).send("Text not provided");
}
});
const PORT = process.env.PORT || 3000;
app.listen(PORT, () => {
console.log(`Server is running on port ${PORT}`);
});
客户端
// Fetch TTS from Backend
export const fetchTTS = async () => {
try {
const response = await fetch("http://localhost:3000/textToSpeech", {
method: "GET",
});
if (!response.ok) {
throw new Error(`HTTP error! status: ${response.status}`);
}
const audioBuffer = await response.arrayBuffer();
if (audioBuffer.byteLength > 0) {
const audioContext = new (window.AudioContext || window.webkitAudioContext)();
const audioBlob = new Blob([audioBuffer], { type: "audio/mpeg" });
const audioUrl = URL.createObjectURL(audioBlob);
const audio = new Audio(audioUrl);
// Play the fetched audio
audio.play();
console.log("Fetched audio response:", audioBlob); // Print the audio response
return audioBlob;
} else {
console.error("Empty audio response");
return null;
}
} catch (error) {
console.error("Error fetching text-to-speech audio:", error);
return null;
}
};
客户端 (或)
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Text-to-Speech Client</title>
</head>
<body>
<h1>Text-to-Speech Client</h1>
<label for="textInput">Enter Text:</label>
<textarea id="textInput" rows="4" cols="50" placeholder="Enter your text here"></textarea>
<button id="convertButton">Convert to Speech</button>
<audio id="audioPlayer" controls style="display:none;"></audio>
<script>
document.addEventListener('DOMContentLoaded', function () {
const convertButton = document.getElementById('convertButton');
const textInput = document.getElementById('textInput');
const audioPlayer = document.getElementById('audioPlayer');
convertButton.addEventListener('click', async function () {
const text = textInput.value.trim();
if (text) {
try {
const response = await fetch(`/textToSpeech?text=${encodeURIComponent(text)}`);
const audioBuffer = await response.arrayBuffer();
const audioBlob = new Blob([audioBuffer], { type: 'audio/mpeg' });
const audioUrl = URL.createObjectURL(audioBlob);
audioPlayer.src = audioUrl;
audioPlayer.style.display = 'block';
} catch (error) {
console.error('Error in text-to-speech request:', error);
}
} else {
alert('Please enter text before converting to speech.');
}
});
});
</script>
</body>
</html>