我正在使用 Python 和 Selenium 以及 Chrome 驱动程序开发一个网络抓取项目,该项目需要客户端证书才能访问页面。 我有 2 个场景必须处理:
我使用的是 Windows/Windows Server,并且使用了注册表项 AutoSelectCertificateForUrls,它会根据 URL(或通配符)自动选择证书。 但对于上面的场景 #2 来说,它没有什么好处。
理想情况下,我想将 URL 和证书名称传递给 Python 脚本,然后让 Chrome 在访问指定的 URL 时使用该证书,但我没有找到实现这一点的方法。 到目前为止,我已经:
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait, Select
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--allow-insecure-localhost')
chrome_options.add_argument('--ignore-ssl-errors=yes')
chrome_options.add_argument('--ignore-certificate-errors')
driver = webdriver.Chrome()
driver.get(url)
:
:
# scrape code here
有人有很好的分步说明来处理这个问题吗?
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
import config # Import your config file
def setup_driver(cert_path, cert_password):
chrome_options = Options()
chrome_options.add_argument('--allow-insecure-localhost')
chrome_options.add_argument('--ignore-ssl-errors=yes')
chrome_options.add_argument('--ignore-certificate-errors')
chrome_options.add_argument(f'--client-certificate-file={cert_path}')
chrome_options.add_argument(f'--client-certificate-password={cert_password}')
service = Service('path/to/chromedriver') # Adjust the path to your ChromeDriver
return webdriver.Chrome(service=service, options=chrome_options)
def access_url(url, cert_path, cert_password):
driver = setup_driver(cert_path, cert_password)
driver.get(url)
# Your scraping code goes here
return driver
if __name__ == "__main__":
driver = access_url(config.url, config.cert_path, config.cert_password)
driver.quit()
create another file (maybe config.py for the url and credentials)
url = "https://example.com" # Change this to the site you want to scrape
cert_path = "path/to/your_certificate.pfx" # Update to your certificate path
cert_password = "your_cert_password" # Your certificate's password