我正在尝试抓取此网站:https://www.vertexconnects.com/find-atc
输入任何邮政编码后,我似乎无法让 while 循环继续单击“加载更多”按钮。该代码似乎在位置行上失败,获取每个位置结果,并出现此错误
raise TimeoutException(message, screen, stacktrace)
selenium.common.exceptions.TimeoutException: Message:
代码如下:
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver import ActionChains
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import Select
options = webdriver.ChromeOptions()
driver = webdriver.Chrome(options = options)
action = ActionChains(driver)
driver.get("https://www.vertexconnects.com/find-atc")
driver.maximize_window()
wait = WebDriverWait(driver,5)
# Use below line only if you are getting the Accept/Reject cookies pop-up
wait.until(EC.element_to_be_clickable((By.XPATH, "//button[contains(.,'Accept All')]"))).click()
location_textbox = wait.until(EC.presence_of_element_located((By.ID,"location-search-input")))
action.move_to_element(location_textbox).click().send_keys("10001").perform()
wait.until(EC.element_to_be_clickable((By.CLASS_NAME, "atc-finder-button"))).click()
while True:
try:
wait.until(EC_element_to_be_clickable((By.ID, "loadMore"))).click()
except:
break
print("done")
locations = wait.until(EC.visibility_of_all_elements_located((By.XPATH, "//div[@class='location-result']")))
for location in locations:
name = location.find_element(By.TAG_NAME, "h4").text()
address = location.find_element(By.CLASS_NAME, "address atc-finder-hospital-address").text()
phone_num = location.find_element(By.TAG_Name, "a").text
print(name, address, phone_num)
代码中的问题:
事实上,您的
while
循环未有效单击 Load More 按钮,原因如下:
1a。在下面的代码中,存在语法错误。应该是
EC.
而不是EC_
wait.until(EC_element_to_be_clickable((By.ID, "loadMore"))).click()
1b。 Selenium 无法通过
EC.element_to_be_clickable
找到“加载更多”按钮。改成EC.presence_of_element_located
在下面的代码中,最后应该是
.text
name = location.find_element(By.TAG_NAME, "h4").text()
address = location.find_element(By.CLASS_NAME, "address atc-finder-hospital-address").text()
以下定位器策略不正确。当有多个类时,不能使用
CLASS_NAME
。仅供参考,address
是一类,atc-finder-hospital-address
是另一类。
By.CLASS_NAME, "address atc-finder-hospital-address"
如果您注意到网页,并不是每家医院都有电话号码。因此,当没有找到
<a>
标签时,下面的代码行将会失败
phone_num = location.find_element(By.TAG_Name, "a").text
这是重构后的代码:
import time
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
options = webdriver.ChromeOptions()
driver = webdriver.Chrome(options = options)
driver.get("https://www.vertexconnects.com/find-atc")
driver.maximize_window()
wait = WebDriverWait(driver,10)
# Use below line only if you are getting the Accept/Reject cookies pop-up
wait.until(EC.element_to_be_clickable((By.XPATH, "//button[contains(.,'Accept All')]"))).click()
wait.until(EC.element_to_be_clickable((By.ID,"location-search-input"))).send_keys("10001")
wait.until(EC.element_to_be_clickable((By.CLASS_NAME, "atc-finder-button"))).click()
while True:
try:
wait.until(EC.presence_of_element_located((By.ID, "loadMore"))).click()
time.sleep(3)
except:
break
print("done")
locations = wait.until(EC.visibility_of_all_elements_located((By.XPATH, "//div[@class='location-result']")))
for location in locations:
name = location.find_element(By.TAG_NAME, "h4").text
address = location.find_element(By.XPATH, "//div[@class='address atc-finder-hospital-address']").text
phone_num = location.find_elements(By.TAG_NAME, "a")
if len(phone_num)>0:
print(name, address, phone_num[0].text)
else:
print(name, address)
控制台输出:
done
Cohen Children's Medical Center 269-01 76th Avenue, New Hyde Park, NY 11040, US
Children’s Hospital of Philadelphia 269-01 76th Avenue, New Hyde Park, NY 11040, US (267) 601-3461
Dana-Farber Brigham Cancer Center 269-01 76th Avenue, New Hyde Park, NY 11040, US (877) 442-3324
Massachusetts General Hospital 269-01 76th Avenue, New Hyde Park, NY 11040, US (617) 643-9042
Boston Medical Center 269-01 76th Avenue, New Hyde Park, NY 11040, US (617) 638-8130
Children's National Hospital 269-01 76th Avenue, New Hyde Park, NY 11040, US (202) 476-5367
CLEVELAND CLINIC 269-01 76th Avenue, New Hyde Park, NY 11040, US (216) 444-5517
Nationwide Children's Hospital 269-01 76th Avenue, New Hyde Park, NY 11040, US (614) 722-6425 Option 6
Ohio State University Wexner Medical Center 269-01 76th Avenue, New Hyde Park, NY 11040, US (614) 293-3153
Cincinnati Children's Hospital Medical Center 269-01 76th Avenue, New Hyde Park, NY 11040, US (513) 517-2234
University of Chicago 269-01 76th Avenue, New Hyde Park, NY 11040, US (773) 702-6808
Northwestern Memorial Hospital 269-01 76th Avenue, New Hyde Park, NY 11040, US (312) 695-0990
The Children’s Hospital At Tristar Centennial 269-01 76th Avenue, New Hyde Park, NY 11040, US
Children's Hospital of New Orleans 269-01 76th Avenue, New Hyde Park, NY 11040, US
Medical City Dallas Hospital 269-01 76th Avenue, New Hyde Park, NY 11040, US
Methodist Hospital 269-01 76th Avenue, New Hyde Park, NY 11040, US
City of Hope National Medical Center 269-01 76th Avenue, New Hyde Park, NY 11040, US (800) 826-4673 [email protected]
Children's Hospital of Orange County (CHOC) 269-01 76th Avenue, New Hyde Park, NY 11040, US
Process finished with exit code 0