Scrapy 错误:下载时出错 - 无法打开 CONNECT 隧道

问题描述 投票:0回答:2

我写了一个蜘蛛来爬行https://tecnoblog.net/categoria/review/但是当我让蜘蛛爬行时,出现了一个错误:

2015-05-19 15:13:20+0100 [scrapy] INFO: Scrapy 0.24.5 started (bot: reviews)
2015-05-19 15:13:20+0100 [scrapy] INFO: Optional features available: ssl, http11
2015-05-19 15:13:20+0100 [scrapy] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'reviews.spiders', 'SPIDER_MODULES': ['reviews.spiders'], 'DOWNLOAD_DELAY': 0.25, 'BOT_NAME': 'reviews'}
2015-05-19 15:13:20+0100 [scrapy] INFO: Enabled extensions: LogStats, TelnetConsole, CloseSpider, WebService, CoreStats, SpiderState
2015-05-19 15:13:20+0100 [scrapy] INFO: Enabled downloader middlewares: ProxyMiddleware, HttpAuthMiddleware, DownloadTimeoutMiddleware, RotateUserAgentMiddleware, RetryMiddleware, DefaultHeadersMiddleware, MetaRefreshMiddleware, RedirectMiddleware, CookiesMiddleware, ChunkedTransferMiddleware, DownloaderStats
2015-05-19 15:13:20+0100 [scrapy] INFO: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware
2015-05-19 15:13:20+0100 [scrapy] INFO: Enabled item pipelines: 
2015-05-19 15:13:20+0100 [tecnoblog] INFO: Spider opened
2015-05-19 15:13:20+0100 [tecnoblog] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2015-05-19 15:13:20+0100 [scrapy] DEBUG: Telnet console listening on 127.0.0.1:6030
2015-05-19 15:13:20+0100 [scrapy] DEBUG: Web service listening on 127.0.0.1:6087
2015-05-19 15:13:25+0100 [tecnoblog] DEBUG: Redirecting (301) to <GET https://tecnoblog.net/categoria/review/> from <GET http://tecnoblog.net/categoria/review/>
2015-05-19 15:13:26+0100 [tecnoblog] ERROR: Error downloading <GET https://tecnoblog.net/categoria/review/>: Could not open CONNECT tunnel.
2015-05-19 15:13:26+0100 [tecnoblog] INFO: Closing spider (finished)
2015-05-19 15:13:26+0100 [tecnoblog] INFO: Dumping Scrapy stats:
    {'downloader/exception_count': 1,
     'downloader/exception_type_count/scrapy.core.downloader.handlers.http11.TunnelError': 1,
     'downloader/request_bytes': 644,
     'downloader/request_count': 2,
     'downloader/request_method_count/GET': 2,
     'downloader/response_bytes': 501,
     'downloader/response_count': 1,
     'downloader/response_status_count/301': 1,
     'finish_reason': 'finished',
     'finish_time': datetime.datetime(2015, 5, 19, 14, 13, 26, 227904),
     'log_count/DEBUG': 3,
     'log_count/ERROR': 1,
     'log_count/INFO': 7,
     'scheduler/dequeued': 2,
     'scheduler/dequeued/memory': 2,
     'scheduler/enqueued': 2,
     'scheduler/enqueued/memory': 2,
     'start_time': datetime.datetime(2015, 5, 19, 14, 13, 20, 217735)}
2015-05-19 15:13:26+0100 [tecnoblog] INFO: Spider closed (finished)

有什么想法为什么会发生这种情况吗? 2015-05-19 15:13:26+0100 [tecnoblog] 错误:下载 https://tecnoblog.net/categoria/review/> 时出错:无法打开 CONNECT 隧道。我在过去一个月爬过这个网站...如何修复它?我尝试将起始网址更改为“http”而不是“https”,但它正在重定向它:S

scrapy
2个回答
7
投票

您可能正在尝试通过 httpshttp-only 代理进行连接。

您可以使用在线 HTTPS 代理测试器来检查您的代理是否支持 https 或使用 Linux curl 命令与代理 :

curl -x http://111.222.333.444:80 -L https://myip.ht

0
投票

2025 年的类似示例,Scrapy v2.11.2:

scrapy.core.downloader.handlers.http11.TunnelError: Could not open CONNECT tunnel with proxy XX.YY.ZZ.KK:8888 [{'status': 500, 'reason': b'Unable to connect'}]

另一个原因可能是您自己或历史上使用该代理 IP 的人对 URL 的攻击过多,他们可能只是阻止了您的代理 IP。

您可以尝试减少每分钟的请求数或使用不同的代理。

© www.soinside.com 2019 - 2024. All rights reserved.