我想使用 Whisper AI 转录音频文件。 我从一篇文章中学到了https://www.assembleai.com/blog/how-to-run-openais-whisper-speech-recognition-model/ 使用 python 版本 3.8.16 和 pytorch 1.12.1 就可以完成这项工作。
我还安装了最新版本的ffmpeg
我使用 jupyter 笔记本来运行我的代码
import sys
print(sys.version)
import torch
print(torch.__version__)
退货
3.8.16 | packaged by conda-forge | (default, Feb 1 2023, 15:53:35) [MSC v.1929 64 bit (AMD64)]
1.12.1
然后我使用这段代码尝试转录测试音频:
import openai
import whisper
openai.api_key = "my open ai key"
audio_file= open(r"C:\Users\Chris\Python scripts\test_audio.mp3", "rb")
model = whisper.load_model("base")
result = model.transcribe(audio_file)
print(result)
运行此程序时出现以下错误
TypeError Traceback (most recent call last)
Cell In[28], line 9
6 audio_file= open(r"C:\Users\Chris\Python scripts\test_audio.mp3", "rb")
8 model = whisper.load_model("base")
----> 9 result = model.transcribe(audio_file)
11 print(result)
File ~\anaconda3\lib\site-packages\whisper\transcribe.py:121, in transcribe(model, audio, verbose, temperature, compression_ratio_threshold, logprob_threshold, no_speech_threshold, condition_on_previous_text, initial_prompt, word_timestamps, prepend_punctuations, append_punctuations, **decode_options)
118 decode_options["fp16"] = False
120 # Pad 30-seconds of silence to the input audio, for slicing
--> 121 mel = log_mel_spectrogram(audio, padding=N_SAMPLES)
122 content_frames = mel.shape[-1] - N_FRAMES
124 if decode_options.get("language", None) is None:
File ~\anaconda3\lib\site-packages\whisper\audio.py:131, in log_mel_spectrogram(audio, n_mels, padding, device)
129 if isinstance(audio, str):
130 audio = load_audio(audio)
--> 131 audio = torch.from_numpy(audio)
133 if device is not None:
134 audio = audio.to(device)
TypeError: expected np.ndarray (got _io.BufferedReader)
为了让它发挥作用,我缺少什么?
你不需要导入 openAI 并为我耳语这个代码工作
import openai
openai.api_key = api_key
audio_file_path = "./audio.mp3"
with open(audio_file_path, "rb") as audio_file:
response = openai.Audio.transcribe(model="whisper-1", engine="whisper",response_format="text", file=audio_file)
print(response)
model.transcribe
的第一个参数是Union[str, np.ndarray, torch.Tensor]
类型——这意味着它需要一个字符串(即文件路径)、一个numpy数组或一个torch张量。尝试直接传递文件路径而不是打开它:
import whisper
audio_file= "C:\Users\Chris\Python scripts\test_audio.mp3"
model = whisper.load_model("base")
result = model.transcribe(audio_file)
print(result)
另外,由于您是离线工作,因此无需设置
openai.api_key
。