Selenium Web 抓取 SSL 连接问题

问题描述 投票:0回答:1

我正在尝试使用 Selenium 执行网页抓取。因此,我想要从中抓取数据的网站需要身份验证。所以,我的目标是登录网站并抓取一些用户相关的数据。

因此,首先我尝试登录该网站,我导航到 https://my.pitchbook.com/,网站自动将我重定向到以下链接: https://loginprod.morningstar.com/loginstate=hKFo2SA1NkY0R2IwakMyYVFwTXNGSF8zampvSVRwU21abWhOZqFupWxvZ2luo3RpZNkgX0NacFNHQTFfc29iOC1lckFEc3JGaFRaWHBNZkJ1Rk2jY2lk2SByWUMwT1V4 SDRpV05jbXpPanVwQjh6UnN0dWtlZXZyUg&client=rYC0OUxH4iWNcmzOjupB8zRstukeevrR&protocol=oauth2&redirect_uri=https%3A%2F%2Fmy.pitchbook.com%2Fauth0%2Fcallback&source=bus0155&response_type=code

将我重定向到上述链接后,会出现一个登录页面,我正在尝试登录该网站。 但是,我收到错误:

enter image description here

我尝试找到错误的解决方案,我什至编写了以下代码:

chrome_options.add_argument('--ignore-certificate-errors')  # Disable SSL verification

代码:

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time

# Set up Chrome options
chrome_options = Options()
chrome_options.add_argument('--ignore-certificate-errors')  # Disable SSL verification

# Set up WebDriver service
service = Service(executable_path="chromedriver.exe")
driver = webdriver.Chrome(service=service, options=chrome_options)

# Navigate to the initial page
driver.get("https://my.pitchbook.com/")

try:
    # Increase the wait time
    wait = WebDriverWait(driver, 20)  

    # Wait for the URL to change after redirection
    wait.until(EC.url_changes("https://my.pitchbook.com/"))


    # Wait for the login form to load on the redirected page
    email_element = wait.until(EC.presence_of_element_located((By.ID, "emailInput")))  
    password_element = wait.until(EC.presence_of_element_located((By.ID, "passwordInput")))  
    login_button = wait.until(EC.element_to_be_clickable((By.CLASS_NAME, "mds-button___ctrsi"))) 

    # Enter credentials (replace with actual credentials)
    email_element.send_keys("email")
    password_element.send_keys("password")
    login_button.click()

    # Wait for the login process to complete
    time.sleep(5)  # Adjust as necessary


    # Now you are authenticated, navigate to the desired page or interact with elements
    driver.get("https://my.pitchbook.com/dashboard/home")

    # Locate the element you want to interact with
    user_name = wait.until(EC.presence_of_element_located((By.CLASS_NAME, "button__caption_a580497eb72793758caf95a9250e8342")))
    value = user_name.text

    print(value)

    # Wait before closing
    time.sleep(5)

finally:
    # Close the browser
    driver.quit()

您的帮助将不胜感激! (我是网页抓取和 Selenium 的新手)

python selenium-webdriver web-scraping
1个回答
0
投票

似乎有些地方需要纠正。 首先,在代码中使用 trycatch 语句。这样,如果有任何错误,它会纠正并继续。

其次,请记住,当存在阻止安全连接到您的网站的问题时,就会发生 SSL 握手错误(我知道,错误名称很有趣)。最有可能的是,您可能安装了防病毒软件。如果是,请将其禁用。

或者,尝试运行相同的代码,但使用 Mozilla Firefox 或 Opera 或 Safari 或任何其他安全浏览器来运行。

谢谢你!保重!

© www.soinside.com 2019 - 2024. All rights reserved.