我对 Azure 语音服务还很陌生,我正在使用 twilo/plivo 服务将号码与 azure stt 连接起来,并在转录后进一步处理它。
我的问题是,当我说话时,它检测得很好,当我停止说话或保持沉默时,它会自动处理包含空转录文本的空语音并返回它,这种情况每 10-15 秒发生一次。它会自动检测语音..直到通话结束我才会取消连续识别。
有人有类似的经历或者我可以改变语音配置吗?请告诉我。
我使用了azure SDK并使用了初始和语音分段超时,但没有变化..我正在实时使用它,所以我不能添加超过一秒的时间。
我尝试了连续语音识别的示例代码,将语音转换为文本,并避免由于静音或噪音而处理空转录。
我使用
InitialSilenceTimeoutMs
和 EndSilenceTimeoutMs
来管理静音,last_recognition_time
来过滤有效识别,使用 evt.result.text.strip()
来跳过空转录。
代码:
import azure.cognitiveservices.speech as speechsdk
import time
SUBSCRIPTION_KEY = "<speechKey>"
REGION = "<speechRegion>"
speech_config = speechsdk.SpeechConfig(subscription=SUBSCRIPTION_KEY, region=REGION)
speech_config.speech_recognition_language = "en-US"
speech_config.set_service_property(name="InitialSilenceTimeoutMs", value="1000", channel=speechsdk.ServicePropertyChannel.UriQueryParameter)
speech_config.set_service_property(name="EndSilenceTimeoutMs", value="1000", channel=speechsdk.ServicePropertyChannel.UriQueryParameter)
audio_config = speechsdk.audio.AudioConfig(use_default_microphone=True)
recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config, audio_config=audio_config)
last_recognition_time = time.time()
def recognizing_handler(evt):
"""Handles partial recognition results."""
if evt.result.text.strip():
print(f"Recognizing: {evt.result.text}")
def recognized_handler(evt):
"""Handles final recognition results."""
global last_recognition_time
if evt.result.reason == speechsdk.ResultReason.RecognizedSpeech:
if evt.result.text.strip() and (time.time() - last_recognition_time > 2):
print(f"Recognized: {evt.result.text}")
last_recognition_time = time.time()
elif evt.result.reason == speechsdk.ResultReason.NoMatch:
print("No speech recognized.")
def canceled_handler(evt):
"""Handles recognition cancellation events."""
print(f"Recognition canceled: {evt.reason}")
if evt.reason == speechsdk.CancellationReason.Error:
print(f"Error details: {evt.error_details}")
def session_started_handler(evt):
"""Handles session start events."""
print("Session started.")
def session_stopped_handler(evt):
"""Handles session stop events."""
print("Session stopped.")
recognizer.recognizing.connect(recognizing_handler)
recognizer.recognized.connect(recognized_handler)
recognizer.canceled.connect(canceled_handler)
recognizer.session_started.connect(session_started_handler)
recognizer.session_stopped.connect(session_stopped_handler)
print("Starting continuous recognition...")
recognizer.start_continuous_recognition()
try:
while True:
time.sleep(1)
except KeyboardInterrupt:
print("Stopping recognition...")
recognizer.stop_continuous_recognition()
输出: