使用 PyAV 编码单声道音频到文件,参数匹配文档,但仍然导致 Errno 22

问题描述 投票:0回答:0

在尝试使用 PyAV 将来自麦克风的实时单声道音频编码为压缩音频流(使用 mp2 或 flac 作为编码器)时,程序不断引发异常

ValueError: [Errno 22] Invalid argument
.

为了消除引起问题的现场麦克风源,并使有问题的代码更容易被其他人运行/测试,我已经删除了麦克风源,现在只生成一个纯音作为输入缓冲区序列。

所有试图找出丢失的、不匹配的或不正确的参数的尝试都只是导致看到与我的代码相同的文档和示例。

我想从成功将 PyAV 用于单声道音频的人那里知道将单声道帧编码为单声道流的正确方法和参数是什么。

使用的包是av 10.0.0安装的

pip3 install av --no-binary av
所以它使用我的包管理器提供的 ffmpeg 库,它是版本 4.2.7.

有问题的python代码是:

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Recreating an error 22 when encoding sound with PyAV.

Created on Sun Feb 19 08:10:29 2023
@author: andrewm
"""
import typing
import sys
import math
import fractions

import av
from av import AudioFrame

""" Ensure some PyAudio constants are still defined without changing 
    the PyAudio recording callback function and without depending 
    on PyAudio simply for reproducing the PyAV bug [Errno 22] thrown in 
    File "av/filter/context.pyx", line 89, in av.filter.context.FilterContext.push
"""
class PA_Stub():
    paContinue = True
    paComplete= False

pyaudio = PA_Stub()


"""Generate pure tone at given frequency with amplitude 0...1.0 at 
   sampling frewuency fs and beginning at phase offset 'phase'.
   Returns the new phase after the sinusoid has cycled over the 
   sampling window length.
"""
def generate_tone(
        freq:int, phase:float, amp:float, fs, samp_fmt, buffer:bytearray
) -> float:
    assert samp_fmt == "s16", "Only s16 supported atm"
    samp_size_bytes = 2
    n_samples = int(len(buffer)/samp_size_bytes)
    window = [int(0) for i in range(n_samples)]
    theta = phase
    phase_inc = 2*math.pi * freq / fs
    for i in range(n_samples):
        v = amp * math.sin(theta)
        theta += phase_inc
        s = int((2**15-1)*v)
        window[i] = s
    for sample_i in range(len(window)):
        byte_i = sample_i * samp_size_bytes
        enc = window[sample_i].to_bytes(
                2, byteorder=sys.byteorder, signed=True
        )
        buffer[byte_i] = enc[0]
        buffer[byte_i+1] = enc[1]
    return theta


channels = 1
fs = 44100  # Record at 44100 samples per second
fft_size_samps = 256
chunk_samps = fft_size_samps * 10  # Record in chunks that are multiples of fft windows.

# print(f"fft_size_samps={fft_size_samps}\nchunk_samps={chunk_samps}")

seconds = 3.0
out_filename = "testoutput.wav"

# Store data in chunks for 3 seconds
sample_limit = int(fs * seconds)
sample_len = 0
frames = []  # Initialize array to store frames

ffmpeg_codec_name = 'mp2'  # flac, mp3, or libvorbis make same error.

sample_size_bytes = 2
buffer = bytearray(int(chunk_samps*sample_size_bytes))
chunkperiod = chunk_samps / fs
total_chunks = int(math.ceil(seconds / chunkperiod))
phase = 0.0

### uncomment if you want to see the synthetic data being used as a mic input.
# with open("test.raw","wb") as raw_out:
#     for ci in range(total_chunks):
#         phase = generate_tone(2600, phase, 0.8, fs, "s16", buffer)
#         raw_out.write(buffer)
# print("finished gen test")
# sys.exit(0)
# #---- 

# Using mp2 or mkv as the container format gets the same error.
with av.open(out_filename+'.mp2', "w", format="mp2") as output_con:
    output_con.metadata["title"] = "My title"
    output_con.metadata["key"] = "value"
    channel_layout = "mono"
    sample_fmt = "s16p"

    ostream = output_con.add_stream(ffmpeg_codec_name, fs, layout=channel_layout)
    assert ostream is not None, "No stream!"
    cctx = ostream.codec_context
    cctx.sample_rate = fs
    cctx.time_base = fractions.Fraction(numerator=1,denominator=fs)
    cctx.format = sample_fmt
    cctx.channels = channels
    cctx.layout = channel_layout
    print(cctx, f"layout#{cctx.channel_layout}")
    
    # Define PyAudio-style callback for recording plus PyAV transcoding.
    def rec_callback(in_data, frame_count, time_info, status):
        global sample_len
        global ostream
        frames.append(in_data)
        nsamples = int(len(in_data) / (channels*sample_size_bytes))
        
        frame = AudioFrame(format=sample_fmt, layout=channel_layout, samples=nsamples)
        frame.sample_rate = fs
        frame.time_base = fractions.Fraction(numerator=1,denominator=fs)
        frame.pts = sample_len
        frame.planes[0].update(in_data)
        print(frame, len(in_data))
        
        for out_packet in ostream.encode(frame):
            output_con.mux(out_packet)
        for out_packet in ostream.encode(None):
            output_con.mux(out_packet)
        
        sample_len += nsamples
        retflag = pyaudio.paContinue if sample_len<sample_limit else pyaudio.paComplete
        return (in_data, retflag)

    print('Beginning')

    ### some e.g. PyAudio code which starts the recording process normally.
    # istream = p.open(
    #     format=sample_format,
    #     channels=channels,
    #     rate=fs,
    #     frames_per_buffer=chunk_samps,
    #     input=True,
    #     stream_callback=rec_callback
    # )
    # print(istream)

    # Normally at this point you just sleep the main thread while
    #  PyAudio calls back with mic data, but here it is all generated.
    for ci in range(total_chunks):
       phase = generate_tone(2600, phase, 0.8, fs, "s16", buffer)
       ret_data, ret_flag = rec_callback(buffer, ci, {}, 1)
       print('.', end='')

    print(" closing.")
    
    # Stop and close the istream 
    # istream.stop_stream()
    # istream.close()


如果取消注释 RAW 输出部分,您会发现生成的数据可以作为 PCM s16 Mono 44100Hz 导入 Audacity 并播放预期的音调,因此生成的音频数据似乎不是问题。

异常前的正常程序控制台输出为:

<av.AudioCodecContext audio/mp2 at 0x7f8e38202cf0> layout#4
Beginning
<av.AudioFrame 0, pts=0, 2560 samples at 44100Hz, mono, s16p at 0x7f8e38202eb0> 5120
.<av.AudioFrame 0, pts=2560, 2560 samples at 44100Hz, mono, s16p at 0x7f8e382025f0> 5120

堆栈跟踪是:

Traceback (most recent call last):

  File "Dev/multichan_recording/av_encode.py", line 147, in <module>
    ret_data, ret_flag = rec_callback(buffer, ci, {}, 1)

  File "Dev/multichan_recording/av_encode.py", line 121, in rec_callback
    for out_packet in ostream.encode(frame):

  File "av/stream.pyx", line 153, in av.stream.Stream.encode

  File "av/codec/context.pyx", line 484, in av.codec.context.CodecContext.encode

  File "av/audio/codeccontext.pyx", line 42, in av.audio.codeccontext.AudioCodecContext._prepare_frames_for_encode

  File "av/audio/resampler.pyx", line 101, in av.audio.resampler.AudioResampler.resample

  File "av/filter/graph.pyx", line 211, in av.filter.graph.Graph.push

  File "av/filter/context.pyx", line 89, in av.filter.context.FilterContext.push

  File "av/error.pyx", line 336, in av.error.err_check

ValueError: [Errno 22] Invalid argument

edit:有趣的是,错误发生在第二个 AudioFrame 上,因为显然第一个 AudioFrame 编码正确,因为除了演示时间戳(pts)之外,它们被赋予相同的属性值,但将其遗漏并让 PyAV/ffmpeg自行生成 PTS 并不能修复错误,因此不正确的 PTS 似乎不是原因。

av/filter/context.pyx
中快速浏览后,异常必须来自
res = lib.av_buffersrc_write_frame(self.ptr, frame.ptr)

的错误返回值 试图从 ffmpeg 源代码深入研究
av_buffersrc_write_frame
,目前尚不清楚是什么导致了这个错误。唯一明显的是通道布局之间的不匹配,但我的代码在流和帧中将布局设置为相同。一个老问题 pyav - cannot save stream as mono 发现了这个问题,他们的答案(所需的一个参数未记录)是代码现在在制作流时具有 layout='mono' 参数的唯一原因。

程序输出显示正在使用 #4 布局,并且从 https://github.com/FFmpeg/FFmpeg/blob/release/4.2/libavutil/channel_layout.h 您可以看到这是符号 AV_CH_FRONT_CENTER 的值是 MONO 布局中的唯一通道。

不匹配肯定是其他一些对象属性或未记录的参数要求。

如何使用 PyAV 将单声道音频编码为压缩流?

ffmpeg pyaudio pyav
© www.soinside.com 2019 - 2024. All rights reserved.