如何正确使用协程运行多个 GET 请求?

问题描述 投票:0回答:1

我正在学习

requests-html
并且想知道如何异步运行多个任务。

在尝试使用

requests-html
执行异步任务时,我遇到一条错误消息,指出
'coroutine' object is not callable
,并且
coroutine 'async_get_url' was never awaited
:

Traceback (most recent call last):
  File "/c/Users/olube/Desktop/v1-logic/options-tactical/eng-cmp--data_sec_ops/.desktop-instance/4-projects-workflow/.web_scrapping/.my_sdk/./main.py", line 21, in <module>
    result = a_hs.run(*[async_get_url(url) for url in urls])
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/c/Users/olube/Desktop/v1-logic/options-tactical/eng-cmp--data_sec_ops/.desktop-instance/.py_env/lib64/python3.11/site-packages/requests_html.py", line 771, in run
    tasks = [
            ^
  File "/c/Users/olube/Desktop/v1-logic/options-tactical/eng-cmp--data_sec_ops/.desktop-instance/.py_env/lib64/python3.11/site-packages/requests_html.py", line 772, in <listcomp>
    asyncio.ensure_future(coro()) for coro in coros
                          ^^^^^^
TypeError: 'coroutine' object is not callable
sys:1: RuntimeWarning: coroutine 'async_get_url' was never awaited

这是我的代码:

#!/usr/bin/env python

from requests_html import HTMLSession as hs, AsyncHTMLSession as a_hs
from pprint import pprint
from typing import Any as any

def get_url(url: str) -> any:
  s = hs()
  r = s.get(url)
  
  return r

async def async_get_url(url: str) -> any:
  s = a_hs()
  r = await s.get(url)
  
  return r

if __name__ == "__main__":
  urls = ('https://python.org/', 'https://reddit.com/', 'https://google.com/')
  result = a_hs.run(*[async_get_url(url) for url in urls])

  pprint(result)

python-3.x web-scraping async-await python-requests-html
1个回答
0
投票

如果我理解正确,那么你正在寻找这样的东西:

from requests_html import AsyncHTMLSession
from pprint import pprint
from typing import Any as any

async def get_url(url: str) -> any:
    print(url)
    return await session.get(url)

session = AsyncHTMLSession()

urls = ["https://python.org/", "https://reddit.com/", "https://google.com/"]
coroutines = [lambda url=url: get_url(url) for url in urls]

result = session.run(*coroutines)
pprint(result)

列表推导式创建一个协程列表,其中

lambda
对于确保不会立即评估这些协程非常重要。使用参数解包将此列表传递给
run()
方法。

输出:

https://python.org/
https://reddit.com/
https://google.com/
[<Response [200]>, <Response [200]>, <Response [200]>]
© www.soinside.com 2019 - 2024. All rights reserved.