我想使用类名来识别有效的搜索结果列表,然后迭代到废品价格。但是,代码仍然无法识别该类。我知道它使用了javascript,但我认为selenium可以在渲染后识别标签。我哪一部分错了?欣赏
import time
import subprocess
import selenium
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
from bs4 import BeautifulSoup
import pandas as pd
#from playsound import playsound
import datetime
import threading
service = Service(executable_path='xxx')
option = webdriver.ChromeOptions()
option.add_argument("--headless=new")
option.add_argument('--ignore-certificate-errors')
option.add_argument("--no-sandbox")
option.add_argument('disable-notifications')
driver = webdriver.Chrome(service=service,options=option)
def search(dep,arr,date):
print(f'''Input: Date:{date},Departure: {dep} - Arrival: {arr}''')
temp_url = 'https://www.kayak.com/flights/'
base_url = temp_url + dep+'-'+arr+'/'+str(date)+'?sort=price_a&fs=stops=0'
df_record = pd.DataFrame(columns=['deptime','arrtime','dep','arr',
'airline' ,'flightNum','price','ling'])
print("before webdriver.ChromeOptions()")
my_url = base_url
driver.get(my_url)
print(my_url)
time.sleep(3) # set the time to wait till web fully loaded
# wait for the close button to be visible and click it
try:
close_button = driver.find_element(By.XPATH, '//*[@class="nrc6"]')
close_button.click()
except:
print("close is not found.")
elem = driver.find_element("xpath","//*")
source_code = elem.get_attribute("outerHTML")
#print(source_code)
bs = BeautifulSoup(source_code, 'html.parser')
#print(bs)
#expand
drawing_url = bs.find_all('button', class_='nrc6')
print(len(drawing_url)) # this shouldn't be zero
if len(drawing_url)==0: return
else: print(base_url)
我不确定我是否能够理解您的担忧。 但基本上,您想从网站上获取价格列表。 根据检查,价格使用相同的类别,即“f8F1-price-text”。
driver.get("https://www.kayak.com/flights/SFO-TYO/2024-03-21/2024-03-28?sort=bestflight_a");
Thread.sleep(5000);
By tempElement = By.xpath("//div[@class='nrc6']");
List <WebElement> elmTicketDetails = driver.findElements(tempElement);
System.out.println("====================================================================================================================");
for (int cnt = 1; cnt <= elmTicketDetails.size(); cnt++) {
By byFromDetail = By.xpath("//div[@class='nrc6'][" + cnt + "]//li[@class='hJSA-item'][1]");
By byToDetail = By.xpath("//div[@class='nrc6'][" + cnt + "]//li[@class='hJSA-item'][2]");
By byPrice = By.xpath("//div[@class='nrc6'][" + cnt + "]//div[@class='f8F1-price-text']");
WebElement elmFromDetail = driver.findElement(byFromDetail);
WebElement elmToDetail = driver.findElement(byToDetail);
WebElement elmPrice = driver.findElement(byPrice);
System.out.println("Details for Ticket # " + cnt);
System.out.println("Flight From: " + elmFromDetail.getText());
System.out.println("Flight To: " + elmToDetail.getText());
System.out.println("Price: " + elmPrice.getText());
System.out.println("====================================================================================================================");
}
回复是:
====================================================================================================================
Details for Ticket # 1
Flight From: 12:20 pm – 7:20 pm
+2
EVA Air
1 stop
TPE
39h 00m
SFO
-
NRT
Flight To: 1:00 pm – 6:40 am
+1
EVA Air
1 stop
TPE
33h 40m
NRT
-
SFO
Price: $1,207
====================================================================================================================
Details for Ticket # 2
Flight From: 11:50 am – 3:10 pm
...
这是用Java写的,但是逻辑应该是一样的。
更新:更新了示例的代码,我将其作为一个整体打印出来,但是您可以声明多个定位器来指向具体细节。