我正在尝试使用 URL:text 进行一些网络抓取,以收集有关底盘移动历史记录的一些数据。
但是,这个网页的时间范围选择器有点奇怪,用Python很难做到。例如,如果我想手动收集自定义时间范围内特定底盘的移动历史数据,例如从 01/1/23 -12/31/23,我必须手动单击选择器多次(如果不清楚,请点击上面的链接并选择时间范围)。
我不知道我是否可以使用 Selenium 实现这个目标,或者我是否应该使用其他工具来实现这一目标。
HTML 代码:
<div class="date_wrapper">
<div id="daterange" class="datepickerrange">
<svg width="24" height="24" viewBox="0 0 24 24" fill="black" xmlns="http://www.w3.org/2000/svg">
<path d="M17 4H21C21.2652 4 21.5196 4.10536 21.7071 4.29289C21.8946 4.48043 22 4.73478 22 5V21C22 21.2652 21.8946 21.5196 21.7071 21.7071C21.5196 21.8946 21.2652 22 21 22H3C2.73478 22 2.48043 21.8946 2.29289 21.7071C2.10536 21.5196 2 21.2652 2 21V5C2 4.73478 2.10536 4.48043 2.29289 4.29289C2.48043 4.10536 2.73478 4 3 4H7V2H9V4H15V2H17V4ZM15 6H9V8H7V6H4V10H20V6H17V8H15V6ZM20 12H4V20H20V12Z"></path>
</svg>
<span>07/9/24 - 08/7/24</span> <i class="toggle-icon"></i>
</div>
</div>
我尝试使用 Selenium 来做到这一点:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.chrome.options import Options
import time
import os
import pandas as pd
from selenium.webdriver.common.keys import Keys
from datetime import datetime
import csv
from bs4 import BeautifulSoup
chrome_options = Options()
# chrome_options.add_argument("--headless")
# chrome_options.add_argument("--disable-gpu")
download_directory = '/Users/temp/py_scraping_selenium_test'
prefs = {'download.default_directory' : '/Users/temp/py_scraping_selenium_test'}
chrome_options.add_experimental_option('prefs', prefs)
filename = '/Users/temp/py_scraping_selenium_test/chasis_info_test_1.csv'
driver = webdriver.Chrome(options=chrome_options)
# Navigate to the initial page
initial_url = 'https://dcli.com/track-a-chassis/?0-chassisType=vin&searchChassis=LJRC41269G1020818' # Update this URL
# Navigate to the initial page
driver.get(initial_url)
# Set up an explicit wait (up to 60 seconds)
wait = WebDriverWait(driver, 600)
# Locate and click the date range picker to open it using JavaScript to ensure it is clicked
date_range_picker = driver.find_element(By.ID, 'daterange')
driver.execute_script("arguments[0].click();", date_range_picker)
# Wait for the date picker to be available
wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, 'input[name="startDate"]')))
# Locate the start and end date input fields
start_date_input = driver.find_element(By.CSS_SELECTOR, 'input[name="startDate"]') # Adjust selector as needed
end_date_input = driver.find_element(By.CSS_SELECTOR, 'input[name="endDate"]') # Adjust selector as needed
# Clear any existing values
start_date_input.clear()
end_date_input.clear()
# Set the desired start and end dates
start_date = '01/01/2023' # Adjust to your desired start date
end_date = '12/31/2023' # Adjust to your desired end date
start_date_input.send_keys(start_date)
end_date_input.send_keys(end_date)
# Submit the form or trigger the update
submit_button = driver.find_element(By.XPATH, '//button[text()="Search"]') # Adjust selector as needed
submit_button.click()
# Wait for the updated results to load
wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, 'table')))
但是我有很多错误...
我期望的是一个特定底盘在自定义时间范围(01/1/23 -12/31/23)下的移动历史记录的表格(csv)。
有什么帮助吗?预先感谢!
wait.until(EC.element_to_be_clickable((By.ID, '日期范围')))
试试这个。我确信它会起作用。让我知道!