Python Web 抓取 [D:websockets.client] > GET %s HTTP/1.1 [D:websockets.client] > %s: %s 未显示所有结果

问题描述 投票:0回答:1

我正在尝试使用 Python 3.10 进行网页抓取,并且库请求 - HTML 0.10.0。

附上代码:

from requests_html import HTMLSession

url = 'https://bodysolid-europe.com/collections/all'

s = HTMLSession()
r = s.get(url)

r.html.render(sleep=1)

products = r.html.xpath('/html/body/div[2]/div[2]/div', first=True)

for item in products.absolute_links:
    r = s.get(item)
    print(r.html.find('header.product-header', first=True).text)

当我尝试通过 Xpath 从 URL 中提取信息时,控制台中显示以下输出:

[D:urllib3.connectionpool] Starting new HTTPS connection (%d): %s:%s
[D:urllib3.connectionpool] %s://%s:%s "%s %s %s" %s %s
[D:asyncio] Using proactor: %s
[D:websockets.client] = connection is CONNECTING
[D:websockets.client] > GET %s HTTP/1.1
[D:websockets.client] > %s: %s
[D:websockets.client] > %s: %s
[D:websockets.client] > %s: %s
[D:websockets.client] > %s: %s
[D:websockets.client] > %s: %s
[D:websockets.client] > %s: %s
[D:websockets.client] > %s: %s
[D:websockets.client] < HTTP/1.1 %d %s

它不会显示项目的所有信息,仅显示一点点,如下所示:

[D:urllib3.connectionpool] %s://%s:%s "%s %s %s" %s %s
[D:urllib3.connectionpool] %s://%s:%s "%s %s %s" %s %s
Body-Solid Europe
Best Fitness Dumbbell Rack BFDR10
[D:urllib3.connectionpool] %s://%s:%s "%s %s %s" %s %s
[D:urllib3.connectionpool] %s://%s:%s "%s %s %s" %s %s
Best Fitness
Best Fitness Bench BFFID10
[D:urllib3.connectionpool] %s://%s:%s "%s %s %s" %s %s
[D:urllib3.connectionpool] %s://%s:%s "%s %s %s" %s %s
Best Fitness
Best Fitness Mountain Climber BFMC10
[D:urllib3.connectionpool] %s://%s:%s "%s %s %s" %s %s
[D:urllib3.connectionpool] %s://%s:%s "%s %s %s" %s %s
Body-Solid Europe
Best Fitness Multi-Station Gym BFMG30
[D:urllib3.connectionpool] %s://%s:%s "%s %s %s" %s %s
[D:urllib3.connectionpool] %s://%s:%s "%s %s %s" %s %s
Best Fitness
Best Fitness Center Drive Elliptical BFE1
[D:urllib3.connectionpool] %s://%s:%s "%s %s %s" %s %s
[D:urllib3.connectionpool] %s://%s:%s "%s %s %s" %s %s
Best Fitness
Best Fitness Olympic Bench BFOB10
[D:urllib3.connectionpool] %s://%s:%s "%s %s %s" %s %s
[D:urllib3.connectionpool] %s://%s:%s "%s %s %s" %s %s
Best Fitness
Best Fitness Functional Trainer BFFT10
[D:urllib3.connectionpool] %s://%s:%s "%s %s %s" %s %s
[D:urllib3.connectionpool] %s://%s:%s "%s %s %s" %s %s
Best Fitness
Best Fitness Leg Developer and Preacher Curl Attachment BFPL10
[D:urllib3.connectionpool] %s://%s:%s "%s %s %s" %s %s
[D:urllib3.connectionpool] %s://%s:%s "%s %s %s" %s %s
Best Fitness
Best Fitness Inversion Table BFINVER10
[D:urllib3.connectionpool] %s://%s:%s "%s %s %s" %s %s
[D:urllib3.connectionpool] %s://%s:%s "%s %s %s" %s %s
Body-Solid Europe

大部分输出只有:

D:websockets.client] < %s
[D:websockets.client] < %s
[D:websockets.client] < %s
[D:websockets.client] < %s
[D:websockets.client] < %s
[D:websockets.client] < %s
[D:websockets.client] < %s
[D:websockets.client] < %s
[D:websockets.client] < %s
[D:websockets.client] < %s
[D:websockets.client] < %s
[D:websockets.client] < %s
[D:websockets.client] < %s
[D:websockets.client] < %s
[D:websockets.client] < %s
[D:websockets.client] < %s
[D:websockets.client] < %s
[D:websockets.client] < %s
[D:websockets.client] < %s
[D:websockets.client] < %s
[D:websockets.client] < %s

我不知道问题是什么。我已经安装了 pyppeteer==1.0.0,因为之前我有这个:

<?xml version='1.0' encoding='UTF-8'?><Error><Code>NoSuchKey</Code><Message>. The specified key does not exist.</Message><Details> No such object: chromium-browser-snapshots/Win_x64/1181205/chrome-win.zip</Details></Error>

但现在它显示“[D:websockets.client] < %s [D:websockets.client] < %s"

我需要修复输出中的错误,以便通过网络抓取从 URL 获取信息。

python web-scraping
1个回答
0
投票

我也遇到了同样的问题,然后我将 pyppeteer 的版本更改为 1.0.2,它不再显示那些烦人的事情。 我还添加了这两行:

PYPPETEER_CHROMIUM_REVISION = "1263111"
os.environ["PYPPETEER_CHROMIUM_REVISION"] = PYPPETEER_CHROMIUM_REVISION
© www.soinside.com 2019 - 2024. All rights reserved.