如何用Python爬取电影？

Question

我正在尝试下载第 60 届金马奖放映的电影《石门》。我发现了一个流媒体链接：

https://www.fofoyy.com/dianying/96937

我无法在页面源代码中找到视频源，但当我使用 F12 检查页面时，我在网络请求中找到了两个 M3U8 文件。通过合并这些，我得到了我需要访问的最终URL：

https://v8.longshengtea.com/yyv8/202310/06/2yJDc3LMsW1/video/2000k_0X1080_64k_25/hls/index.m3u8

利用requests库发起GET请求，我收到了以.jpeg结尾的文件。我尝试解释这些并将它们写入以 .ts 结尾的文件中，并且一些视频是可以播放的。

我有两个策略：

使用 for 循环来请求 M3U8 文件中的每个 URL。

首先，它太慢了。其次，有些人成功了，有些人则不然。

实现
```
aiohttp
```
以使用协程进行异步请求。

首先，速度更快。其次，所有请求均不成功。有经验丰富的人可以帮助我吗？我万分感激！

我的代码：

import asyncio
import aiohttp
import aiofiles
import os
import re


async def get_urls_from_m3u8(m3u8_url):
    async with aiohttp.ClientSession() as session:
        async with session.get(m3u8_url) as response:
            if response.status == 200:
                content = await response.text()
                urls = re.findall(r'https?://[^\s]+\.jpeg', content)
                return urls
            else:
                print(f"Failed to fetch M3U8 file, status code: {response.status}")
                return []


async def download_image(url, directory, max_retries=3):
    for attempt in range(max_retries + 1):
        try:
            async with aiohttp.ClientSession() as session:
                async with session.get(url, timeout=30) as response:
                    if response.status == 200:
                        ts_filename = re.search(r'\d+', url.split("/")[-1]).group()
                        ts_filepath = os.path.join(directory, ts_filename)

                        async with aiofiles.open(ts_filepath, 'wb') as ts_file:
                            # 使用块读取的方法
                            while True:
                                chunk = await response.content.read(1024)  # 每次读取1KB
                                if not chunk:
                                    break
                                await ts_file.write(chunk)

                        print(f"Successfully saved {ts_filename}")
                        return True
                    else:
                        print(f"Failed to download {url}, status code: {response.status}")
                        return False
        except Exception as e:
            if attempt < max_retries:
                print(f"Attempt {attempt + 1} failed. Retrying...")
            else:
                print(f"Error downloading {url}: {e}")
                return False


async def main():
    directory = "downloaded_files_ts"
    if not os.path.exists(directory):
        os.makedirs(directory)

    m3u8_url = 'https://v8.longshengtea.com/yyv8/202310/06/2yJDc3LMsW1/video/2000k_0X1080_64k_25/hls/index.m3u8'
    urls = await get_urls_from_m3u8(m3u8_url)

    tasks = [download_image(url, directory) for url in urls]
    await asyncio.gather(*tasks)


if __name__ == '__main__':
    asyncio.run(main())

Answer 1

仅使用python获取m3u8链接，然后用户aria2下载此m3u8链接。

如何用Python爬取电影？

问题描述投票：0回答：1

1个回答

最新问题

如何用Python爬取电影？

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1