我正在尝试将 FileStorage 对象转换为 numpy ndarray 以进行 ASR 转录。这是我的代码:
@app.route('/api/transcribe', methods=\['POST'\])
def transcribe():
if 'audioFile' not in request.files:
return 'No audio file provided', 400
file = request.files\['audioFile'\]
接下来我必须将其转换为具有 16000 采样率和单通道的 ndarray。如何在不保存临时文件的情况下做到这一点?
我尝试使用声音文件、wav 或其他库,但没有任何用处。
要将 FileStorage 对象转换为 NumPy ndarray,您可以使用 Python 中的“io.BytesIO”模块。
def transcribe():
if 'audioFile' not in request.files:
return 'No audio file provided', 400
file = request.files['audioFile']
# Read the file content as bytes
audio_data = file.read()
# Convert the bytes to a NumPy array using soundfile
with io.BytesIO(audio_data) as f:
data, samplerate = sf.read(f)
# Check if the audio has multiple channels
if len(data.shape) > 1:
# Convert to mono by averaging channels
data = np.mean(data, axis=1)
# Resample the audio to 16kHz if necessary
if samplerate != 16000:
# You may need to install the resampy package for this
import resampy
data = resampy.resample(data, samplerate, 16000)
# Ensure the data is in the correct dtype
data = data.astype(np.float32)
# Now you have your audio data in a NumPy array
print(data.shape) # Shape of the array
print(samplerate) # Sample rate of the audio
# Perform ASR transcription or other processing
return 'Transcription done', 200
请确保安装所需的软件包(soundfile 和 resampy)您可以使用 pip 安装它们:
pip install soundfile resampy