在Python中使用selenium进行网页抓取的问题

问题描述 投票:0回答:2

我已经编写了使用Python登录网站的代码。但它需要很长时间才能实现,并且没有达到预期的效果,并给出一条消息说“DevTools Listening on”。 (

send_keys
命令不起作用,不会将“密码”和“用户名”写入输入框。

代码是:

from selenium import webdriver
from selenium.webdriver.common.by import By
import time
username="rahbar"
password="password"
print("something before scrape")
driver = webdriver.Chrome()

driver.get("https://plp.irbroker.com/index.do")
time.sleep(100) #waits 100 seconds
driver.maximize_window()
driver.find_element("name", "j_username").send_keys(username)

driver.find_element_by_name( name="j_password" ).send_keys(password)
print("something after scrape")

终端收到的消息是

something before scrape
*DevTools listening on ws://127.0.0.1:57315/devtools/browser/fd73344f-c857-439e-b20c-c0e76d39f389
Created TensorFlow Lite XNNPACK delegate for CPU.
Attempting to use a delegate that only supports static-sized tensors with a graph that has dynamic-sized tensors (tensor#58 is a dynamic-sized tensor)*

我希望代码能够在合理的短时间内打开网页并在输入框中写入“密码”和“用户名”。 你能指导我解决这个问题吗(我是Python新手,如果我的问题很微不足道,很抱歉)?

python selenium-webdriver devtools
2个回答
0
投票
  • 不要依赖
    time.sleep(100)
    ,这可能会导致长时间的延误。
  • 相反,用
    WebDriverWait
    替换延迟以等待元素正确加载。
  • 这是代码的更新版本:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
import time

username = "rahbar"
password = "password"
chrome_options = Options()
chrome_options.add_argument("--start-maximized")
driver = webdriver.Chrome(options=chrome_options)
driver.get("https://plp.irbroker.com/index.do")
try:
    WebDriverWait(driver, 10).until(
        EC.presence_of_element_located((By.NAME, "j_username"))
    )
    driver.find_element(By.NAME, "j_username").send_keys(username)
    driver.find_element(By.NAME, "j_password").send_keys(password)
    driver.find_element(By.XPATH, "//button[@type='submit']").click()
    print("Login fields populated successfully.")
    
except Exception as e:
    print(f"Error occurred: {e}")
finally:
    time.sleep(10)
    driver.quit()

0
投票

让我们在实际抓取之前改进代码

建议将所有数据移至代码开头。在一个地方更容易查看所有必要的数据:

# data for log-in
username = "rahbar"
password = "password"
site_url = "https://plp.irbroker.com/index.do"

现在我们集中精力优化和细化几个点

首先,让我们解决网站登录和浏览器初始化问题,并将全屏模式添加到浏览器启动参数中:

# initialize driver
chrome_options = Options()
chrome_options.add_argument("--start-maximized")
driver = webdriver.Chrome(options=chrome_options)
driver.get(site_url) 

现在,我们代替

time.sleep()
,实现一个方法,该方法在元素显示在网站上后返回该元素,尽管它并不总是准确和正确的:

# wait until both input elements will be on screen
username_element = WebDriverWait(driver, 10).until(
            EC.element_to_be_clickable((By.CSS_SELECTOR, "input[name='j_username']")))
password_element = WebDriverWait(driver, 10).until(
            EC.element_to_be_clickable((By.CSS_SELECTOR, "input[name='j_password']")))

如果要将数据输入到

input
元素中,最好使用
EC.element_to_be_clickable
。此方法不仅检查页面上元素的存在,还检查交互性(即元素是否可见或禁用)。


现在,让我们清除

input
字段(如有必要)并输入值:

# clear them (not important is site does not use start value)
username_element.clear()
password_element.clear()
# set values
username_element.send_keys(username)
password_element.send_keys(password)

最后,我们进入最后一部分(可选,因为问题中没有提到)现在我们等到可以单击登录按钮(这与

input
字段的工作方式相同):

# wait until login button will be clickable and then press it
log_in = WebDriverWait(driver, 10).until(
            EC.element_to_be_clickable((By.CSS_SELECTOR, "button[type='submit']"))).click()

现在代码优化清晰了)

这是完整版:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options

# data for log-in
username = "rahbar"
password = "password"
site_url = "https://plp.irbroker.com/index.do"

# something before scrap

# initialize driver
chrome_options = Options()
chrome_options.add_argument("--start-maximized")
driver = webdriver.Chrome(options=chrome_options)
driver.get(site_url)

# wait until both input elements will be on screen
username_element = WebDriverWait(driver, 10).until(
            EC.element_to_be_clickable((By.CSS_SELECTOR, "input[name='j_username']")))
password_element = WebDriverWait(driver, 10).until(
            EC.element_to_be_clickable((By.CSS_SELECTOR, "input[name='j_password']")))
# clear them (not important is site does not use start value)
username_element.clear()
password_element.clear()
# set values
username_element.send_keys(username)
password_element.send_keys(password)
# wait until login button will be clickable and then press it
log_in = WebDriverWait(driver, 10).until(
            EC.element_to_be_clickable((By.CSS_SELECTOR, "button[type='submit']"))).click()

# and then thing after scrap, good luck with scrap btw
© www.soinside.com 2019 - 2024. All rights reserved.