我目前正在使用谷歌云的语音到文本 api 来转录一个长音频文件。我已经设法通过编写一个 python 脚本来设置它来做到这一点,但我想添加一个自定义短语集来改进转录结果,如他们的文档中所述https://cloud.google.com/speech-到文本/文档/语音适应
我有一个要添加的短语列表,但我似乎无法弄清楚如何让它在 python 中工作。
这是我当前的代码:
def transcribe_gcs(gcs_uri, phrases):
"""Asynchronously transcribes the audio file specified by the gcs_uri."""
client = speech.SpeechClient()
audio = speech.RecognitionAudio(uri=gcs_uri)
config = speech.RecognitionConfig(
encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,
sample_rate_hertz=44100,
language_code="en-US",
use_enhanced=True,
model="video",
enable_automatic_punctuation=True,
audio_channel_count=2,
speech_context = speech.SpeechContext(phrases=phrases)
)
operation = client.long_running_recognize(config=config, audio=audio)
print("Waiting for operation to complete...")
response = operation.result(timeout=20000)
transcript = []
confidence_ints = []
# Each result is for a consecutive portion of the audio. Iterate through
# them to get the transcripts for the entire audio file.
for result in response.results:
# The first alternative is the most likely one for this portion.
print("Transcript: {}".format(result.alternatives[0].transcript))
print("Confidence: {}".format(result.alternatives[0].confidence))
transcript.append(str(result.alternatives[0].transcript))
confidence_ints.append(float(result.alternatives[0].confidence))
return transcript, confidence_ints
一切正常,直到我尝试使用 speech_context 参数添加短语。请帮忙!
迭戈:座头鲸,和我们一起说吧。 全部:座头鲸。 迭戈:大声点! 全部:座头鲸。 迭戈:每个人都尖叫! 全部:座头鲸。 迭戈:非常好。