使用 azure 语音转文本时保存麦克风音频输入

Question

我目前在我的项目中使用 Azure 语音转文本。它直接从麦克风识别语音输入（这就是我想要的）并保存文本输出，但我也有兴趣保存音频输入，以便稍后收听。在迁移到 Azure 之前，我使用 python 语音识别库和 recognize_google，这允许我使用 get_wav_data() 将输入保存为 .wav 文件。我可以在 Azure 中使用类似的东西吗？我阅读了文档，但只能找到保存音频文件以进行文本转语音的方法。我的临时解决方案是先自己保存音频输入，然后在该音频文件上使用 azure stt，而不是直接使用麦克风进行输入，但我担心这会减慢该过程。有什么想法吗？预先感谢您！

Answer 1

我是 Microsoft 语音 SDK 团队的 Darren。不幸的是，目前没有内置支持同时从麦克风进行实时识别并将音频写入 WAV 文件。我们之前已经听到过这个客户的请求，我们会考虑在未来版本的语音 SDK 中添加此功能。

我认为您目前可以做的（这需要您进行一些编程），是将语音 SDK 与推送流一起使用。您可以编写代码从麦克风读取音频缓冲区并将其写入 WAV 文件。同时，您可以将相同的音频缓冲区推送到语音 SDK 中进行识别。我们有 Python 示例，展示如何将语音 SDK 与推送流结合使用。请参阅此文件中的函数“speech_recognition_with_push_stream”：https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/samples/python/console/speech_sample.py。但是，我不熟悉用于从麦克风读取实时音频缓冲区以及写入 WAV 文件的 Python 选项。达伦

Answer 2

如果您使用Azure的

speech_recognizer.recognize_once_async()

，您可以同时使用

pyaudio

捕获麦克风。以下是我使用的代码：

#!/usr/bin/env python3

# enter your output path here:
output_file='/Users/username/micaudio.wav'

import pyaudio, signal, sys, os, requests, wave
pa = pyaudio.PyAudio()
import azure.cognitiveservices.speech as speechsdk

def vocrec_callback(in_data, frame_count, time_info, status):
    global voc_data
    voc_data['frames'].append(in_data)
    return (in_data, pyaudio.paContinue)

def vocrec_start():
    global voc_stream
    global voc_data
    voc_data = {
        'channels':1 if sys.platform == 'darwin' else 2,
        'rate':44100,
        'width':pa.get_sample_size(pyaudio.paInt16),
        'format':pyaudio.paInt16,
        'frames':[]
    }
    voc_stream = pa.open(format=voc_data['format'],
                    channels=voc_data['channels'],
                    rate=voc_data['rate'],
                    input=True,
                    output=False,
                    stream_callback=vocrec_callback)
    
def vocrec_stop():
    voc_stream.close()

def vocrec_write():
    with wave.open(output_file, 'wb') as wave_file:
        wave_file.setnchannels(voc_data['channels'])
        wave_file.setsampwidth(voc_data['width'])
        wave_file.setframerate(voc_data['rate'])
        wave_file.writeframes(b''.join(voc_data['frames']))

class SIGINT_handler():
    def __init__(self):
        self.SIGINT = False
    def signal_handler(self, signal, frame):
        self.SIGINT = True
        print('You pressed Ctrl+C!')
        vocrec_stop()
        quit()

def init_azure():
    global speech_recognizer
    #  ——— check azure keys
    my_speech_key = os.getenv('SPEECH_KEY')
    if my_speech_key is None:
        error_and_quit("Error: No Azure Key.")
    my_speech_region = os.getenv('SPEECH_REGION')
    if my_speech_region is None:
        error_and_quit("Error: No Azure Region.")
    _headers = {
        'Ocp-Apim-Subscription-Key': my_speech_key,
        'Content-type': 'application/x-www-form-urlencoded',
        # 'Content-Length': '0',
    }
    _URL = f"https://{my_speech_region}.api.cognitive.microsoft.com/sts/v1.0/issueToken"
    _response = requests.post(_URL,headers=_headers)
    if not "200" in str(_response):
        error_and_quit("Error: Wrong Azure Key Or Region.")
    #  ——— keys correct. continue
    speech_config = speechsdk.SpeechConfig(subscription=os.environ.get('SPEECH_KEY'),
                                           region=os.environ.get('SPEECH_REGION'))
    audio_config_stt = speechsdk.audio.AudioConfig(use_default_microphone=True)
    speech_config.set_property(speechsdk.PropertyId.SpeechServiceResponse_RequestSentenceBoundary, 'true')
    #  ——— disable profanity filter:
    speech_config.set_property(speechsdk.PropertyId.SpeechServiceResponse_ProfanityOption, "2")
    speech_config.speech_recognition_language="en-US"
    speech_recognizer = speechsdk.SpeechRecognizer(
        speech_config=speech_config,
        audio_config=audio_config_stt)

def error_and_quit(_error):
     print(error)
     quit()

def recognize_speech ():
    vocrec_start()
    print("Say something: ")
    speech_recognition_result = speech_recognizer.recognize_once_async().get()
    print("Recording done.")
    vocrec_stop()
    vocrec_write()
    quit()

handler = SIGINT_handler()
signal.signal(signal.SIGINT, handler.signal_handler)

init_azure()
recognize_speech()

Answer 3

我正在使用 C# 语言的语音 SDK。我想知道最新版本的操作系统语音 SDK 是否支持直接麦克风输入以及是否能够将捕获的音频存储在 Azure Blob 存储中？

使用 azure 语音转文本时保存麦克风音频输入

问题描述投票：0回答：3

3个回答

最新问题

使用 azure 语音转文本时保存麦克风音频输入

问题描述 投票：0回答：3

3个回答

最新问题

问题描述投票：0回答：3