urllib3 中的并发连接

Question

使用循环向各个网站发出多个请求，如何使用 urllib3 中的代理来做到这一点？

代码将读取 URL 元组，并使用 for 循环连接到每个站点，但是，目前它不会连接元组中的第一个 url。还有一个代理。

list = ['https://URL1.com', 'http://URL2.com', 'http://URL3.com']
for i in list:
    http = ProxyManager("PROXY-PROXY")
    http_get = http.request('GET', i, preload_content=False).read().decode()

我已经从上面的代码中删除了网址和代理信息。元组中的第一个 URL 将正常运行，但在此之后，不会发生其他任何事情，只是等待。我已经尝试过

clear()

方法来重置循环中每次的连接。

Answer 1

自从 3.2 添加了

concurrent.futures

以来，使用线程实际上并不是那么麻烦：

from urllib3 import ProxyManager
from concurrent.futures import ThreadPoolExecutor,wait

url_list:list = ['https://URL1.com', 'http://URL2.com', 'http://URL3.com']
thread_pool:ThreadPoolExecutor = ThreadPoolExecutor(max_workers=min(len(url_list),20))
http:ProxyManager = ProxyManager("PROXY-PROXY")
tasks = []

for url in url_list:
    tasks.append(thread_pool.submit(http.request,url,preload_content=False))

wait(tasks)
all_responses:list = [task.result().read().decode() for task in tasks]

更高版本通过

asyncio

提供事件循环。我遇到的

asyncio

问题通常与库的可移植性有关（IE

aiohttp

通过

pydantic

），其中大多数不是纯 python，并且具有外部 libc 依赖项。如果您必须支持许多可能具有

musl-libc

（高山）或

glibc

（其他所有人）的 Docker 应用程序，这可能是一个问题。

Answer 2

不幸的是 urllib3 是同步的并且会阻塞。您可以将其与线程一起使用，但这很麻烦并且通常会导致更多问题。目前的主要方法是使用一些异步网络。 Twisted 和 asyncio（可能还有 aiohttp）是流行的软件包。

我将提供一个使用

trio

框架和

asks

的示例：

import asks
import trio
asks.init('trio')

path_list = ['https://URL1.com', 'http://URL2.com', 'http://URL3.com']

results = []

async def grabber(path):
    r = await s.get(path)
    results.append(r)

async def main(path_list):
    async with trio.open_nursery() as n:
        for path in path_list:
            n.spawn(grabber(path))

s = asks.Session()
trio.run(main, path_list)

urllib3 中的并发连接

问题描述投票：0回答：2

2个回答

最新问题

urllib3 中的并发连接

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2