我有一个
AVAudioPlayerNode
附加到一个AVAudioEngine
。样本缓冲液通过 playerNode
方法提供给 scheduleBuffer()
。
但是,
playerNode
似乎在扭曲音频。输出不是简单地“通过”缓冲区,而是失真并包含静电(但大部分仍可听到)。
相关代码:
let myBufferFormat = AVAudioFormat(standardFormatWithSampleRate: 48000, channels: 2)
// Configure player node
let playerNode = AVAudioPlayerNode()
audioEngine.attach(playerNode)
audioEngine.connect(playerNode, to: audioEngine.mainMixerNode, format: myBufferFormat)
// Provide audio buffers to playerNode
for await buffer in mySource.streamAudio() {
await playerNode.scheduleBuffer(buffer)
}
在上面的示例中,
mySource.streamAudio()
从 ScreenCaptureKit SCStreamDelegate
实时提供音频。音频缓冲区作为 CMSampleBuffer
到达,被转换为 AVAudioPCMBuffer
,然后通过 AsyncStream
传递到上面的音频引擎。我已验证转换后的缓冲区有效。
也许缓冲区到达的速度不够快?这张约 25,000 帧的图表表明
inputNode
正在定期插入“零”帧的片段:
失真似乎是这些空框的结果。
即使我们从管道中删除
AsyncStream
,并立即在 ScreenCaptureKit 回调中处理缓冲区,失真仍然存在。这是一个可以按原样运行的端到端示例(重要部分是didOutputSampleBuffer
):
class Recorder: NSObject, SCStreamOutput {
private let audioEngine = AVAudioEngine()
private let playerNode = AVAudioPlayerNode()
private var stream: SCStream?
private let queue = DispatchQueue(label: "sampleQueue", qos: .userInitiated)
func setupEngine() {
let format = AVAudioFormat(standardFormatWithSampleRate: 48000, channels: 2)
audioEngine.attach(playerNode)
// playerNode --> mainMixerNode --> outputNode --> speakers
audioEngine.connect(playerNode, to: audioEngine.mainMixerNode, format: format)
audioEngine.prepare()
try? audioEngine.start()
playerNode.play()
}
func startCapture() async {
// Capture audio from Safari
let availableContent = try! await SCShareableContent.excludingDesktopWindows(true, onScreenWindowsOnly: false)
let display = availableContent.displays.first!
let app = availableContent.applications.first(where: {$0.applicationName == "Safari"})!
let filter = SCContentFilter(display: display, including: [app], exceptingWindows: [])
let config = SCStreamConfiguration()
config.capturesAudio = true
config.sampleRate = 48000
config.channelCount = 2
stream = SCStream(filter: filter, configuration: config, delegate: nil)
try! stream!.addStreamOutput(self, type: .audio, sampleHandlerQueue: queue)
try! stream!.addStreamOutput(self, type: .screen, sampleHandlerQueue: queue) // To prevent warnings
try! await stream!.startCapture()
}
func stream(_ stream: SCStream, didOutputSampleBuffer sampleBuffer: CMSampleBuffer, of type: SCStreamOutputType) {
switch type {
case .audio:
let pcmBuffer = createPCMBuffer(from: sampleBuffer)!
playerNode.scheduleBuffer(pcmBuffer, completionHandler: nil)
default:
break // Ignore video frames
}
}
func createPCMBuffer(from sampleBuffer: CMSampleBuffer) -> AVAudioPCMBuffer? {
var ablPointer: UnsafePointer<AudioBufferList>?
try? sampleBuffer.withAudioBufferList { audioBufferList, blockBuffer in
ablPointer = audioBufferList.unsafePointer
}
guard let audioBufferList = ablPointer,
let absd = sampleBuffer.formatDescription?.audioStreamBasicDescription,
let format = AVAudioFormat(standardFormatWithSampleRate: absd.mSampleRate, channels: absd.mChannelsPerFrame) else { return nil }
return AVAudioPCMBuffer(pcmFormat: format, bufferListNoCopy: audioBufferList)
}
}
let recorder = Recorder()
recorder.setupEngine()
Task {
await recorder.startCapture()
}
您的“将缓冲区写入文件:失真!” block 几乎肯定在做一些缓慢且阻塞的事情(比如写入文件)。您每 170 毫秒 (8192/48k) 就会被调用一次。 tap 块的执行时间最好不要超过它,否则你会落在后面并丢弃缓冲区。
可以在写入文件时跟上进度,但这取决于您的操作方式。如果你正在做一些非常低效的事情(比如为每个缓冲区重新打开和刷新文件),那么你可能跟不上。
如果这个理论是正确的,那么现场扬声器输出不应该有静电,只有你的输出文件。
罪魁祸首是
createPCMBuffer()
功能。用这个替换它,一切运行顺利:
func createPCMBuffer(from sampleBuffer: CMSampleBuffer) -> AVAudioPCMBuffer? {
let numSamples = AVAudioFrameCount(sampleBuffer.numSamples)
let format = AVAudioFormat(cmAudioFormatDescription: sampleBuffer.formatDescription!)
let pcmBuffer = AVAudioPCMBuffer(pcmFormat: format, frameCapacity: numSamples)!
pcmBuffer.frameLength = numSamples
CMSampleBufferCopyPCMDataIntoAudioBufferList(sampleBuffer, at: 0, frameCount: Int32(numSamples), into: pcmBuffer.mutableAudioBufferList)
return pcmBuffer
}
我问题中的原始功能直接取自苹果的 ScreenCaptureKit 示例项目。它在技术上是可行的,并且在写入文件时音频听起来不错,但显然它对于实时音频来说还不够快。
编辑: 实际上这可能与速度无关,因为由于复制数据,新功能平均慢 2-3 倍。可能是在用指针创建
AVAudioPCMBuffer
时,底层数据被释放了。
我还看到 createPCMBuffer 实现在某些情况下返回 nil,破坏了我们的麦克风 audiowave 代码和 ScreenCaptureKit 录音实现。
似乎复制不安全指针并在传递给 withAudioBufferList 的块之外使用它不是一个好主意。我们已经为此调整了我们的实施,并解决了问题:
func createPCMBuffer(for sampleBuffer: CMSampleBuffer) -> AVAudioPCMBuffer? {
try? sampleBuffer.withAudioBufferList { audioBufferList, _ -> AVAudioPCMBuffer? in
guard let absd = sampleBuffer.formatDescription?.audioStreamBasicDescription else { return nil }
guard let format = AVAudioFormat(standardFormatWithSampleRate: absd.mSampleRate, channels: absd.mChannelsPerFrame) else { return nil }
return AVAudioPCMBuffer(pcmFormat: format, bufferListNoCopy: audioBufferList.unsafePointer)
}
}
此解决方案更接近于 Apple 的原始 ScreenCaptureKit 示例,并且不应该受到 Hundley 上面描述的性能影响。