我正在使用 python 和 pytube 库来下载 youtube 上给定播放列表链接的所有音频。这就是我想到的:
import pytube
playlist = pytube.Playlist(youtube_playlist_link)
for video in playlist.videos:
pytube.query.StreamQuery.get_by_itag(video.streams,itag=251).download(output_path=r'C:\Users\Anderson\OneDrive\Desktop\Vids')
下载 12 个音频流花了一分钟多的时间,这比我的网速慢得多。我尝试使用异步,但我认为它不适用于该库,因此我改用多重处理。
import pytube
from multiprocessing import Process
yt = pytube.Playlist(youtube_playlist_link)
def first_half():
for video in range(0,7,1):
#Downloads first half of playlist audio
x = yt.video_urls[video]
y = pytube.YouTube(x)
pytube.query.StreamQuery.get_by_itag(y.streams, itag=251).download(output_path=r'C:\Users\Anderson\OneDrive\Desktop\Vids')
def second_half():
for video in range(7,12,1):
#Download second half of playlist audio
if __name__ == '__main__':
fh = Process(target=first_half)
sh = Process(target=second_half)
fh.start()
sh.start()
fh.join()
sh.join()
这比我之前一分钟的时间减少了一半多,但效率很低。 如果我的 cpu 中有 4 个以上的核心,有没有一种方法可以使用它们来下载音频,而不必为每个核心创建一个新函数?(所有 6 个核心将平均分配 12 首歌曲,一个函数将它们全部下载)。
async
方法是最好的,或者可能是 async
和 multiprocessing
的某种组合,因为这主要涉及网络和磁盘。
但是我不熟悉
pytube
或 async-pytube
所以我只想解决不必为每个处理器创建新函数的问题。
import pytube
from multiprocessing import cpu_count, Process
import os
yt = pytube.Playlist(youtube_playlist_link)
def download_vids(id: int, start: int, end: int):
for video in range(start, end, 1):
print(f'[{id}] - downloading video', i)
x = yt.video_urls[video]
y = pytube.YouTube(x)
pytube.query.StreamQuery.get_by_itag(y.streams, itag=251).download(output_path=r'C:\Users\Anderson\OneDrive\Desktop\Vids')
if __name__ == '__main__':
CPU_COUNT = cpu_count()
# better than cpu_count but only supported on some *nix distros
try:
CPU_COUNT = len(os.sched_getaffinity(0))
except AttributeError:
pass
NUM_VIDS = len(yt.video_urls)
print('CPU COUNT: ', CPU_COUNT)
print('NUM VIDS: ', NUM_VIDS)
print()
start_end = []
step, rem = divmod(NUM_VIDS, CPU_COUNT)
off = 0
# evenly distribute downloads amongst processes
for i in range(CPU_COUNT):
start = i * step + off
end = start + step
if rem > 0:
off += 1
end += 1
start_end.append({'start': start, 'end': end})
rem -= 1
# create processes
procs: list[Process] = []
for i in range(CPU_COUNT):
proc = Process(target=download_vids, kwargs=dict(id=i, **start_end[i]))
procs.append(proc)
# start processes
for i, proc in enumerate(procs):
proc.start()
print('-- started process:', i)
# join processes
for i, proc in enumerate(procs):
proc.join()
print('-- joined process:', i)