我有一些
test.wav
文件,我需要从该文件中获取声音的频率及其长度。为了测试我的解决方案,我用这些“嘟嘟声”创建了 test.wav
声音:
1500 Hz 持续 100 ms、1900 Hz 持续 300 ms、1200 Hz 持续 10 ms、1900 Hz 持续 300 ms
我希望我的解决方案能够打印“嘟嘟声”,指示频率和长度。但我的解决方案忽略了 10 毫秒的短蜂鸣声(经过一些实验,我发现我的解决方案忽略了所有短 100 毫秒的蜂鸣声(并且它还增加了一些频率)
我的无效解决方案:
import numpy as np
from scipy.io import wavfile
def extract_frequencies_and_durations(wav_path, frame_duration=0.02, freq_tolerance=5):
rate, audio_data = wavfile.read(wav_path)
if audio_data.ndim > 1:
audio_data = np.mean(audio_data, axis=1)
frame_length = int(rate * frame_duration)
frequencies = []
durations = []
current_freq = None
current_duration = 0
for i in range(0, len(audio_data), frame_length):
frame = audio_data[i:i + frame_length]
fft_spectrum = np.fft.fft(frame)
freqs = np.fft.fftfreq(len(fft_spectrum), 1 / rate)
magnitude = np.abs(fft_spectrum)
dominant_freq = abs(freqs[np.argmax(magnitude)])
if current_freq is None or abs(dominant_freq - current_freq) > freq_tolerance:
if current_freq is not None:
frequencies.append(current_freq)
durations.append(current_duration)
current_freq = dominant_freq
current_duration = frame_duration
else:
current_duration += frame_duration
if current_freq is not None:
frequencies.append(current_freq)
durations.append(current_duration)
return frequencies, durations
# output
wav_path = 'test.wav'
frequencies, durations = extract_frequencies_and_durations(wav_path)
for freq, dur in zip(frequencies, durations):
print(f"{freq:.2f}Hz - {dur * 1000:.2f}ms")
我该如何解决此解决方案或还有其他解决方案吗?
帧持续时间猜测为 0.02 秒,这会影响频率分析。 最好的猜测似乎是最短突发的持续时间,它可以提供更好的频率/持续时间结果。
print("Test audio: 1517:0.05; 1921:0.3; 1245:0.01; 1977:0.3\n")
wav_path = 'test2.wav'
frame_duration = 0.01
print(f"frame_duration={frame_duration}")
frequencies, durations = extract_frequencies_and_durations(wav_path, frame_duration=frame_duration)
for freq, dur in zip(frequencies, durations):
print(f"{freq:.2f}Hz - {dur * 1000:.2f}ms")
frame_duration = 0.02
print(f"\nframe_duration={frame_duration}")
frequencies, durations = extract_frequencies_and_durations(wav_path, frame_duration=frame_duration)
for freq, dur in zip(frequencies, durations):
print(f"{freq:.2f}Hz - {dur * 1000:.2f}ms")
生成的音频:
ffmpeg -f lavfi -i "sine=f=1517:d=0.05[0];sine=f=1921:d=0.3[1];sine=f=1245:d=0.01[2];sine=f=1977:d=0.3[3]; [0][1][2][3]concat=n=4:v=0:a=1" test2.wav
结果:
Test audio: 1517:0.05; 1921:0.3; 1245:0.01; 1977:0.3
frame_duration=0.01
rate: 44100, frame_duration: 0.01, frame_length: 441
1500.00Hz - 50.00ms
1900.00Hz - 300.00ms
1200.00Hz - 10.00ms
2000.00Hz - 300.00ms
frame_duration=0.02
rate: 44100, frame_duration: 0.02, frame_length: 882
1500.00Hz - 40.00ms
1900.00Hz - 300.00ms
1250.00Hz - 20.00ms
2000.00Hz - 300.00ms