如何在 python 中通过 pytube 库有效地使用多重处理以加快下载速度?

问题描述 投票:0回答:1

我正在使用 python 和 pytube 库来下载 youtube 上给定播放列表链接的所有音频。这就是我想到的:

import pytube

playlist = pytube.Playlist(youtube_playlist_link)

for video in playlist.videos:
    pytube.query.StreamQuery.get_by_itag(video.streams,itag=251).download(output_path=r'C:\Users\Anderson\OneDrive\Desktop\Vids')

下载 12 个音频流花了一分钟多的时间,这比我的网速慢得多。我尝试使用异步,但我认为它不适用于该库,因此我改用多重处理。

import pytube
from multiprocessing import Process

yt = pytube.Playlist(youtube_playlist_link)

def first_half(): 
    for video in range(0,7,1):
        #Downloads first half of playlist audio
        x = yt.video_urls[video]
        y = pytube.YouTube(x)
        pytube.query.StreamQuery.get_by_itag(y.streams, itag=251).download(output_path=r'C:\Users\Anderson\OneDrive\Desktop\Vids')

def second_half():
    for video in range(7,12,1):
        #Download second half of playlist audio

if __name__ == '__main__':
    fh = Process(target=first_half)
    sh = Process(target=second_half)
    fh.start()
    sh.start()
    fh.join()
    sh.join()
   

这比我之前一分钟的时间减少了一半多,但效率很低。 如果我的 cpu 中有 4 个以上的核心,有没有一种方法可以使用它们来下载音频,而不必为每个核心创建一个新函数?(所有 6 个核心将平均分配 12 首歌曲,一个函数将它们全部下载)。

python-3.x multiprocessing
1个回答
0
投票

async
方法是最好的,或者可能是
async
multiprocessing
的某种组合,因为这主要涉及网络和磁盘。

但是我不熟悉

pytube
async-pytube
所以我只想解决不必为每个处理器创建新函数的问题。

import pytube
from multiprocessing import cpu_count, Process
import os


yt = pytube.Playlist(youtube_playlist_link)

def download_vids(id: int, start: int, end: int):
    for video in range(start, end, 1):
        print(f'[{id}] - downloading video', i)

        x = yt.video_urls[video]
        y = pytube.YouTube(x)
        pytube.query.StreamQuery.get_by_itag(y.streams, itag=251).download(output_path=r'C:\Users\Anderson\OneDrive\Desktop\Vids')


if __name__ == '__main__':
    CPU_COUNT = cpu_count()

    # better than cpu_count but only supported on some *nix distros
    try:
        CPU_COUNT = len(os.sched_getaffinity(0))
    except AttributeError:
        pass

    NUM_VIDS = len(yt.video_urls)
    
    print('CPU COUNT: ', CPU_COUNT)
    print('NUM VIDS: ', NUM_VIDS)
    print()

    start_end = []
    step, rem = divmod(NUM_VIDS, CPU_COUNT)
    off = 0

    # evenly distribute downloads amongst processes
    for i in range(CPU_COUNT):
        start = i * step + off
        end = start + step
        if rem > 0:
            off += 1
            end += 1
        
        start_end.append({'start': start, 'end': end})
        rem -= 1


    # create processes
    procs: list[Process] = []
    for i in range(CPU_COUNT):
        proc = Process(target=download_vids, kwargs=dict(id=i, **start_end[i]))
        procs.append(proc)

    # start processes
    for i, proc in enumerate(procs):
        proc.start()
        print('-- started process:', i)

    # join processes
    for i, proc in enumerate(procs):
        proc.join()
        print('-- joined process:', i)

© www.soinside.com 2019 - 2024. All rights reserved.