我正在尝试使用 Selenium 执行网页抓取。因此,我想要从中抓取数据的网站需要身份验证。所以,我的目标是登录网站并抓取一些用户相关的数据。
因此,首先我尝试登录该网站,我导航到 https://my.pitchbook.com/,网站自动将我重定向到以下链接: https://loginprod.morningstar.com/loginstate=hKFo2SA1NkY0R2IwakMyYVFwTXNGSF8zampvSVRwU21abWhOZqFupWxvZ2luo3RpZNkgX0NacFNHQTFfc29iOC1lckFEc3JGaFRaWHBNZkJ1Rk2jY2lk2SByWUMwT1V4 SDRpV05jbXpPanVwQjh6UnN0dWtlZXZyUg&client=rYC0OUxH4iWNcmzOjupB8zRstukeevrR&protocol=oauth2&redirect_uri=https%3A%2F%2Fmy.pitchbook.com%2Fauth0%2Fcallback&source=bus0155&response_type=code
将我重定向到上述链接后,会出现一个登录页面,我正在尝试登录该网站。 但是,我收到错误:
我尝试找到错误的解决方案,我什至编写了以下代码:
chrome_options.add_argument('--ignore-certificate-errors') # Disable SSL verification
代码:
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time
# Set up Chrome options
chrome_options = Options()
chrome_options.add_argument('--ignore-certificate-errors') # Disable SSL verification
# Set up WebDriver service
service = Service(executable_path="chromedriver.exe")
driver = webdriver.Chrome(service=service, options=chrome_options)
# Navigate to the initial page
driver.get("https://my.pitchbook.com/")
try:
# Increase the wait time
wait = WebDriverWait(driver, 20)
# Wait for the URL to change after redirection
wait.until(EC.url_changes("https://my.pitchbook.com/"))
# Wait for the login form to load on the redirected page
email_element = wait.until(EC.presence_of_element_located((By.ID, "emailInput")))
password_element = wait.until(EC.presence_of_element_located((By.ID, "passwordInput")))
login_button = wait.until(EC.element_to_be_clickable((By.CLASS_NAME, "mds-button___ctrsi")))
# Enter credentials (replace with actual credentials)
email_element.send_keys("email")
password_element.send_keys("password")
login_button.click()
# Wait for the login process to complete
time.sleep(5) # Adjust as necessary
# Now you are authenticated, navigate to the desired page or interact with elements
driver.get("https://my.pitchbook.com/dashboard/home")
# Locate the element you want to interact with
user_name = wait.until(EC.presence_of_element_located((By.CLASS_NAME, "button__caption_a580497eb72793758caf95a9250e8342")))
value = user_name.text
print(value)
# Wait before closing
time.sleep(5)
finally:
# Close the browser
driver.quit()
您的帮助将不胜感激! (我是网页抓取和 Selenium 的新手)
似乎有些地方需要纠正。 首先,在代码中使用 try 和 catch 语句。这样,如果有任何错误,它会纠正并继续。
其次,请记住,当存在阻止安全连接到您的网站的问题时,就会发生 SSL 握手错误(我知道,错误名称很有趣)。最有可能的是,您可能安装了防病毒软件。如果是,请将其禁用。
或者,尝试运行相同的代码,但使用 Mozilla Firefox 或 Opera 或 Safari 或任何其他安全浏览器来运行。
谢谢你!保重!