scrapy - ResponseNeverReceived('SSL例程','','读取时出现意外的eof')

问题描述 投票:0回答:1

我在使用 Scrapy 抓取网站时遇到问题。我正在向特定 API 端点发出 GET 请求,但请求失败并出现 SSL 错误。下面是请求的代码和随后的错误消息。


    url = 'https://www.macmap.org/api/v2/ntlc-products?countryCode=764&level=8&code=010229'

    headers = {
        "Accept": "application/json, text/javascript, */*; q=0.01",
        "Accept-Language": "en-US,en;q=0.9,ml;q=0.8",
        "Connection": "keep-alive",
        "Content-Type": "application/json; charset=utf-8",
        "DNT": "1",
        "Referer": "https://www.macmap.org/en//query/results?reporter=764&partner=004&product=010229&level=6",
        "Sec-Fetch-Dest": "empty",
        "Sec-Fetch-Mode": "cors",
        "Sec-Fetch-Site": "same-origin",
        "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/126.0.0.0 Safari/537.36",
        "X-Requested-With": "XMLHttpRequest",
    }


    request = Request(
        url=url,
        method='GET',
        dont_filter=True,
        headers=headers,
    )

    fetch(request)

然而,我收到以下回复:

2024-07-17 06:07:03 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://www.macmap.org/api/v2/ntlc-products?countryCode=764&level=8&code=010229> (failed 1 times): [<twisted.python.failure.Failure OpenSSL.SSL.Error: [('SSL routines', '', 'unexpected eof while reading')]>]
2024-07-17 06:07:03 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://www.macmap.org/api/v2/ntlc-products?countryCode=764&level=8&code=010229> (failed 2 times): [<twisted.python.failure.Failure OpenSSL.SSL.Error: [('SSL routines', '', 'unexpected eof while reading')]>]
2024-07-17 06:07:03 [scrapy.downloadermiddlewares.retry] ERROR: Gave up retrying <GET https://www.macmap.org/api/v2/ntlc-products?countryCode=764&level=8&code=010229> (failed 3 times): [<twisted.python.failure.Failure OpenSSL.SSL.Error: [('SSL routines', '', 'unexpected eof while reading')]>]
---------------------------------------------------------------------------
ResponseNeverReceived                     Traceback (most recent call last)
Cell In[1], line 27
      5 headers = {
      6     "Accept": "application/json, text/javascript, */*; q=0.01",
      7     "Accept-Language": "en-US,en;q=0.9,ml;q=0.8",
   (...)
     16     "X-Requested-With": "XMLHttpRequest",
     17 }
     20 request = Request(
     21     url=url,
     22     method='GET',
     23     dont_filter=True,
     24     headers=headers,
     25 )
---> 27 fetch(request)

File /opt/venv-python3.9-scrapy/lib/python3.9/site-packages/scrapy/shell.py:110, in Shell.fetch(self, request_or_url, spider, redirect, **kwargs)
    108 response = None
    109 try:
--> 110     response, spider = threads.blockingCallFromThread(
    111         reactor, self._schedule, request, spider)
    112 except IgnoreRequest:
    113     pass

File /opt/venv-python3.9-scrapy/lib/python3.9/site-packages/twisted/internet/threads.py:119, in blockingCallFromThread(reactor, f, *a, **kw)
    117 result = queue.get()
    118 if isinstance(result, failure.Failure):
--> 119     result.raiseException()
    120 return result

File /opt/venv-python3.9-scrapy/lib/python3.9/site-packages/twisted/python/failure.py:475, in Failure.raiseException(self)
    474 def raiseException(self):
--> 475     raise self.value.with_traceback(self.tb)

ResponseNeverReceived: [<twisted.python.failure.Failure OpenSSL.SSL.Error: [('SSL routines', '', 'unexpected eof while reading')]>]```

我使用的软件包版本是:

  • Scrapy==2.5.0
  • pyOpenSSL==22.0.0
  • 密码学==38.0.4
python-3.x ubuntu ssl scrapy pyopenssl
1个回答
0
投票

这是一个简单的蜘蛛,它将从该 API 端点提取数据。

import scrapy
import json
from urllib.parse import urlencode
import logging

headers = {
    'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:127.0) Gecko/20100101 Firefox/127.0',
    'Accept': 'application/json, text/javascript, */*; q=0.01',
    'Accept-Language': 'en-US,en;q=0.5',
    'Content-Type': 'application/json; charset=utf-8',
    'X-Requested-With': 'XMLHttpRequest',
    'Connection': 'keep-alive',
    'Referer': 'https://www.macmap.org/en//query/results?reporter=826&partner=710&product=010229&level=6',
    'Sec-Fetch-Dest': 'empty',
    'Sec-Fetch-Mode': 'cors',
    'Sec-Fetch-Site': 'same-origin',
}

class ProductSpider(scrapy.Spider):
    name = "product"

    def start_requests(self):
        params = {
            'countryCode': '826',
            'level': '8',
            'code': '010229',
        }

        base_url = "https://www.macmap.org/api/v2/ntlc-products"
        url_with_params = f"{base_url}?{urlencode(params)}"

        yield scrapy.Request(url_with_params, self.parse, headers=headers)
            
    def parse(self, response):
        records = json.loads(response.text)

        for record in records:
            yield(record)

我更改了国家/地区代码以获得更多结果。您可以恢复为原来的国家/地区代码。

解析器方法一次返回一条记录。或者,您也可以批量退回所有这些。

日志摘录:

2024-07-18 05:53:13 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.macmap.org/api/v2/ntlc-products?countryCode=826&level=8&code=010229>                                                                                     
{'Code': '0102291050', 'Name': 'Live cattle (excl. pure-bred for breeding): Other: Of a weight not exceeding 80\xa0kg: Bulls of the Schwyz, Fribourg and spotted Simmental breeds, other than for slaughter'}                           
2024-07-18 05:53:13 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.macmap.org/api/v2/ntlc-products?countryCode=826&level=8&code=010229>                                                                                     
{'Code': '0102291090', 'Name': 'Live cattle (excl. pure-bred for breeding): Other: Of a weight not exceeding 80\xa0kg: Other'}                                                                                                          
2024-07-18 05:53:13 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.macmap.org/api/v2/ntlc-products?countryCode=826&level=8&code=010229>                                                                                     
{'Code': '0102292100', 'Name': 'Live cattle (excl. pure-bred for breeding): Other: Of a weight exceeding 80\xa0kg but not exceeding 160\xa0kg: For slaughter'}                                                                          
2024-07-18 05:53:13 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.macmap.org/api/v2/ntlc-products?countryCode=826&level=8&code=010229>                                                                                     
{'Code': '0102292910', 'Name': 'Live cattle (excl. pure-bred for breeding): Other: Of a weight exceeding 80\xa0kg but not exceeding 160\xa0kg: Other: Young male bovine animals, intended for fattening'}                               
2024-07-18 05:53:13 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.macmap.org/api/v2/ntlc-products?countryCode=826&level=8&code=010229>                                                                                     
{'Code': '0102291030', 'Name': 'Live cattle (excl. pure-bred for breeding): Other: Of a weight not exceeding 80\xa0kg: Heifers of the Schwyz and Fribourg breeds, other than for slaughter'}                                            
2024-07-18 05:53:13 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.macmap.org/api/v2/ntlc-products?countryCode=826&level=8&code=010229>                                                                                     
{'Code': '0102291040', 'Name': 'Live cattle (excl. pure-bred for breeding): Other: Of a weight not exceeding 80\xa0kg: Heifers of the spotted Simmental breed, other than for slaughter'}                                               
2024-07-18 05:53:13 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.macmap.org/api/v2/ntlc-products?countryCode=826&level=8&code=010229>                                                                                     
{'Code': '0102291020', 'Name': 'Live cattle (excl. pure-bred for breeding): Other: Of a weight not exceeding 80\xa0kg: Heifers of the grey, brown or yellow mountain breeds and spotted Pinzgau breed, other than for slaughter'}
© www.soinside.com 2019 - 2024. All rights reserved.