使用Python和Selenium验证网址

问题描述 投票:0回答:2

我想进行一些基本的URL验证,如果URL无效,除非用户输入了有效的URL,否则不应继续进行请求。

import time 
from selenium import webdriver
from selenium.webdriver.common.keys import Keys

user_url = input('Please enter a valid url:')
driver = webdriver.Chrome('/home/m/Desktop/chromedriver')
driver.get(user_url)
HEADERS = {'user-agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.61 Safari/537.36', 'accept': '*/*'}

time.sleep(8)

imagecounter = driver.find_elements_by_css_selector('img')

print('Number of HTML image tags:')
print(len(imagecounter))

您能否修改代码并解释发生了什么?我已经尝试过一些库,但是我认为由于我的编码技巧很差,所以没有运气。

python selenium validation url python-requests
2个回答
0
投票

您可以使用请求获取HTTP状态代码

    import requests
    import time 
    from selenium import webdriver
    from selenium.webdriver.common.keys import Keys

    user_url = input('Please enter a valid url:')

    # send a get request to the page, and if the status code is not OK
    # ask for a different url
    req = requests.get(user_url)
    while req.status_code != requests.codes['ok']:
        user_url = input('Please enter a valid url:')


    driver = webdriver.Chrome()
    driver.get(user_url)
    HEADERS = {'user-agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.61 Safari/537.36', 'accept': '*/*'}

    time.sleep(8)

    imagecounter = driver.find_elements_by_css_selector('img')

    print('Number of HTML image tags:')
    print(len(imagecounter))

0
投票

要在继续操作之前验证用户提供的url,可以使用Python's 模块来检查请求状态,并且可以使用以下解决方案:

  • 代码块:

    from selenium import webdriver
    import requests
    
    while True:
        user_url = str(input("Please enter a valid url:"))
        req = requests.get(user_url)
        if req.status_code != requests.codes['ok']:
            print("Not a valid url, please try again...")
            continue
        else:
            break
    print("URL was a valid one... Continuing...")
    driver = webdriver.Chrome(executable_path=r'C:\WebDrivers\chromedriver.exe')
    driver.get(user_url)
    # perform your rest of the tasks
    
  • 控制台输出:

    Please enter a valid url:https://www.goodday.com
    Not a valid url, please try again...
    Please enter a valid url:https://www.goodday.com
    Not a valid url, please try again...
    Please enter a valid url:https://www.goodday.com
    Not a valid url, please try again...
    Please enter a valid url:https://www.google.com
    URL was a valid one... Continuing...
    
    DevTools listening on ws://127.0.0.1:54638/devtools/browser/975e0993-166a-4144-a05f-dcfb1d9b29a2
    

参考

您可以在以下位置找到几个相关的讨论:

© www.soinside.com 2019 - 2024. All rights reserved.