如何检测wav文件中蜂鸣声的频率和持续时间?

问题描述 投票:0回答:1

我有一些

test.wav
文件,我需要从该文件中获取声音的频率及其长度。为了测试我的解决方案,我用这些“嘟嘟声”创建了
test.wav
声音:

1500 Hz 持续 100 ms、1900 Hz 持续 300 ms、1200 Hz 持续 10 ms、1900 Hz 持续 300 ms

我希望我的解决方案能够打印“嘟嘟声”,指示频率和长度。但我的解决方案忽略了 10 毫秒的短蜂鸣声(经过一些实验,我发现我的解决方案忽略了所有短 100 毫秒的蜂鸣声(并且它还增加了一些频率)

我的无效解决方案:

import numpy as np
from scipy.io import wavfile

def extract_frequencies_and_durations(wav_path, frame_duration=0.02, freq_tolerance=5):
    rate, audio_data = wavfile.read(wav_path)
    if audio_data.ndim > 1:
        audio_data = np.mean(audio_data, axis=1)
    frame_length = int(rate * frame_duration)
    frequencies = []
    durations = []
    current_freq = None
    current_duration = 0
    for i in range(0, len(audio_data), frame_length):
        frame = audio_data[i:i + frame_length]
        fft_spectrum = np.fft.fft(frame)
        freqs = np.fft.fftfreq(len(fft_spectrum), 1 / rate)
        magnitude = np.abs(fft_spectrum)
        dominant_freq = abs(freqs[np.argmax(magnitude)])
        if current_freq is None or abs(dominant_freq - current_freq) > freq_tolerance:
            if current_freq is not None:
                frequencies.append(current_freq)
                durations.append(current_duration)
            current_freq = dominant_freq
            current_duration = frame_duration
        else:
            current_duration += frame_duration
    if current_freq is not None:
        frequencies.append(current_freq)
        durations.append(current_duration)
    
    return frequencies, durations

# output
wav_path = 'test.wav'
frequencies, durations = extract_frequencies_and_durations(wav_path)
for freq, dur in zip(frequencies, durations):
    print(f"{freq:.2f}Hz - {dur * 1000:.2f}ms")

我该如何解决此解决方案或还有其他解决方案吗?

python numpy scipy wav
1个回答
0
投票

帧持续时间猜测为 0.02 秒,这会影响频率分析。 最好的猜测似乎是最短突发的持续时间,它可以提供更好的频率/持续时间结果。

print("Test audio: 1517:0.05; 1921:0.3; 1245:0.01; 1977:0.3\n")
wav_path = 'test2.wav'
frame_duration = 0.01
print(f"frame_duration={frame_duration}")
frequencies, durations = extract_frequencies_and_durations(wav_path, frame_duration=frame_duration)
for freq, dur in zip(frequencies, durations):
    print(f"{freq:.2f}Hz - {dur * 1000:.2f}ms")


frame_duration = 0.02
print(f"\nframe_duration={frame_duration}")
frequencies, durations = extract_frequencies_and_durations(wav_path, frame_duration=frame_duration)
for freq, dur in zip(frequencies, durations):
    print(f"{freq:.2f}Hz - {dur * 1000:.2f}ms")

生成的音频:

ffmpeg -f lavfi -i "sine=f=1517:d=0.05[0];sine=f=1921:d=0.3[1];sine=f=1245:d=0.01[2];sine=f=1977:d=0.3[3]; [0][1][2][3]concat=n=4:v=0:a=1" test2.wav

结果:

Test audio: 1517:0.05; 1921:0.3; 1245:0.01; 1977:0.3

frame_duration=0.01
rate: 44100, frame_duration: 0.01, frame_length: 441
1500.00Hz - 50.00ms
1900.00Hz - 300.00ms
1200.00Hz - 10.00ms
2000.00Hz - 300.00ms

frame_duration=0.02
rate: 44100, frame_duration: 0.02, frame_length: 882
1500.00Hz - 40.00ms
1900.00Hz - 300.00ms
1250.00Hz - 20.00ms
2000.00Hz - 300.00ms
© www.soinside.com 2019 - 2024. All rights reserved.