嗨,我已经被这个问题困扰了几个星期,并且到处寻找尝试解决它。我正在尝试从太阳网站体育页面上删除一些信息。我可以获取框的标题,甚至可以获取其下方的小描述,但是当我尝试获取 href 时,循环会中途停止并抛出错误(selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable定位元素:{"method":"xpath","selector":".//a"}).
这是我一直在运行的代码。我最初认为这是一个元素查找问题,直到我添加了打印(链接),因为我只返回第一个链接或只是错误。我尝试过改变它的发现方式,例如通过 xpath、css 或标签等,但也没有成功
import pandas as pd
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
website = 'https://www.thesun.co.uk/sport/football/'
path = 'D:\\Programming\\Automate with Python\\Automating\\chromedriver_win32'
chrome_options = Options()
chrome_options.add_experimental_option('detach', True)
service = Service(executable_path=path)
browser = webdriver.Chrome(options=chrome_options)
browser.get(website)
containers = browser.find_elements(by="xpath", value='//div[@class="teaser__copy-container"]')
titles = []
sub_titles = []
links = []
for container in containers:
title = container.find_element(By.CSS_SELECTOR, 'span').get_attribute("textContent")
sub_title = container.find_element(By.CSS_SELECTOR, 'h3').get_attribute("textContent")
link = container.find_element(By.XPATH, './/a').get_attribute("href")
titles.append(title)
sub_titles.append(sub_title)
links.append(link)
print(link)
df_headlines = pd.DataFrame({'title': titles, 'sub-title': sub_titles, 'links': links})
df_headlines.to_csv('headline.csv')
网站上的链接是否已损坏?任何帮助将不胜感激,因为这有点挑战,我想解决它,提前致谢。
正如我在评论中解释的那样,更改此行。这将强制容器内有“a”元素
containers = browser.find_elements(by="xpath", value='//div[@class="teaser__copy-container" and .//a]')