我正在尝试从下面代码所示的网页中提取 HTML 数据。其他网站可以工作,但下面代码中显示的网站会导致错误。是什么导致了错误?
这是代码
import requests
url = 'https://clasificadosonline.com/' # URL of the webpage to scrape
try:
response = requests.get(url)
response.raise_for_status()
html_content = response.text
# Print the HTML content
print(html_content)
except requests.exceptions.RequestException as e:
print(f"Request failed: {e}")
except requests.exceptions.RequestException as e:
print(f"Request failed: {e}")
这是我遇到的错误。
"C:\Users\17874\OneDrive - University of Puerto Rico\Desktop\WebScraping\venv\Scripts\python.exe" "C:\Users\17874\OneDrive - University of Puerto Rico\Desktop\WebScraping\venv\Clasificados Online.py"
Request failed: HTTPSConnectionPool(host='clasificadosonline.com', port=443): Max retries exceeded with url: / (Caused by SSLError(SSLError(1, '[SSL: SSLV3_ALERT_HANDSHAKE_FAILURE] sslv3 alert handshake failure (_ssl.c:997)')))
Process finished with exit code 0
'''
尝试使用
DEFAULT@SECLEVEL=1
密码强制 SSL 连接:
import ssl
import warnings
import requests
import requests.packages.urllib3.exceptions as urllib3_exceptions
warnings.simplefilter("ignore", urllib3_exceptions.InsecureRequestWarning)
class TLSAdapter(requests.adapters.HTTPAdapter):
def init_poolmanager(self, *args, **kwargs):
ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.set_ciphers("DEFAULT@SECLEVEL=1")
ctx.options |= 0x4
kwargs["ssl_context"] = ctx
return super(TLSAdapter, self).init_poolmanager(*args, **kwargs)
url = "https://clasificadosonline.com/" # URL of the webpage to scrape
with requests.session() as s:
s.mount("https://", TLSAdapter())
response = s.get(url)
response.raise_for_status()
html_content = response.text
print(html_content)
打印:
<script type="text/javascript">
<!--
if (screen.width <= 480) {
document.location = "https://www.clasificadosonline.com/m/";
}
//-->
</script>
<html><!-- #BeginTemplate "/Templates/master.dwt" --><!-- DW6 -->
...