我的网络服务的一部分负责接收用户的音频并使用 Google Cloud Speech API 将其转录为文本。它在桌面上完美运行(无论是在本地机器上还是部署在网络服务器上),但当我尝试从手机(iPhone、Chrome)调用它时返回以下错误:google.api_core.exceptions.InvalidArgument: 400 RecognitionAudio未设置。
JavaScript 文件以
Blob
格式写入带有音频的 WAV
并将其传递到内存:
// Store the audio data in chunks as it is recorded
mediaRecorder.addEventListener("dataavailable", (event) => {
chunks.push(event.data);
});
// When recording stops, create a blob from the chunks and send it to the server
mediaRecorder.addEventListener("stop", () => {
const blob = new Blob(chunks, { type: "audio/wav" });
const url = URL.createObjectURL(blob);
audio.src = url;
uploadAudio(blob); // Send the audio data to the server
chunks = []; // Clear the chunks array
});
以下Python代码处理它:
audio_bytes = request.get_data()
# Instantiate a client
client = speech.SpeechClient()
# Settings for Speech v1:
# Initialize request arguments
config = speech.RecognitionConfig()
config.language_code = LANGUAGE
config.model = "latest_long"
config.enable_automatic_punctuation = True
audio = speech.RecognitionAudio()
audio.content = audio_bytes
speech_request = speech.RecognizeRequest(
config=config,
audio=audio,
)
# Make the request
response = client.recognize(request=speech_request)
根据谷歌云语音文档,
RecognitionConfig.AudioEncoding
对于WAV
音频是不需要的,所以我省略了它。
如果来自移动设备的请求,我会在日志中看到details = "Invalid recognition 'config': bad encoding.."
。
我检查了传递给 Python 文件的音频数据不为空,但如果音频是在移动设备上录制的,显然它的编码错误。
我应该怎么做才能让它在桌面和移动设备上都能工作?