我正在尝试使用 Selenium 滚动到网页上的特定部分并从该部分检索文本。
背景:
我正在使用一个网页,该网页通过
user-select: none
和 -webkit-user-select: none
等 CSS 属性禁用文本突出显示。我可以使用 JavaScript 禁用这些属性,但我现在的主要挑战是 自动向下滚动到 DOM 中的“Production / Artist”部分,然后获取文本。
这是我正在使用的网页的 URL:
网页链接
我尝试使用 Selenium 滚动到“制作/艺术家”部分,但我不确定对于这个特定的页面结构是否使用了正确的方法。
我当前的代码:
from selenium import webdriver
from selenium.webdriver.common.by import By
# Initialize WebDriver
driver = webdriver.Chrome()
# Open the URL
url = "https://www.art-mate.net/doc/78492?name=%E6%A8%82%E3%83%BB%E8%AA%BC%E7%8D%A8%E5%A5%8F%E5%AE%B6%E6%A8%82%E5%9C%98%E2%94%80%E2%94%80%E5%A4%A7%E6%8F%90%E7%90%B4%E8%88%87%E9%A6%AC%E7%89%B9%E8%AB%BE%E7%90%B4%E3"
driver.get(url)
# Scroll to the "Production / Artist" section
element = driver.find_element(By.XPATH, "//h2[text()='Production / Artist']")
driver.execute_script("arguments[0].scrollIntoView();", element)
# Now attempt to copy the text from the section
production_artist_section = driver.find_element(By.XPATH, "//div[contains(text(), 'Production / Artist')]")
print(production_artist_section.text)
# Close the driver
driver.quit()
问题:
我的问题:
在我尝试获取文本之前,如何确保 Selenium 平滑准确地滚动到页面上的“Production / Artist”部分?
任何有关如何优化滚动行为的帮助或建议将不胜感激!
检查下面的工作代码并在评论中进行解释:
代码:
import time
from selenium import webdriver
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome()
url = "https://www.art-mate.net/doc/78492?name=%E6%A8%82%E3%83%BB%E8%AA%BC%E7%8D%A8%E5%A5%8F%E5%AE%B6%E6%A8%82%E5%9C%98%E2%94%80%E2%94%80%E5%A4%A7%E6%8F%90%E7%90%B4%E8%88%87%E9%A6%AC%E7%89%B9%E8%AB%BE%E7%90%B4%E3"
driver.get(url)
driver.maximize_window()
wait = WebDriverWait(driver, 10)
# Click on 'En' element
wait.until(EC.element_to_be_clickable((By.XPATH, "//a[@class='cms_lang cms_lang_en']"))).click()
time.sleep(5)
people = wait.until(EC.visibility_of_all_elements_located((By.XPATH, "//span[@class='people_cell people_role']")))
name = wait.until(EC.visibility_of_all_elements_located((By.XPATH, "//div[@class='people_box']//a")))
people_roles = []
people_names = []
# Below for loops will append each web element into the respective arrays
for p in people:
people_roles.append(p.text)
for n in name:
people_names.append(n.text)
print("People roles:", people_roles)
print("People names:", people_names)
控制台结果:
People roles: ['Presented by', 'Artistic Director / Cello', 'Ondes Martenot', 'Composer', 'Viola', 'Performed by']
People names: ['Musicus Society', 'Trey Lee', 'Nadia Ratsimandresy', 'Seung-Won Oh', 'Aurélie Entringer', 'Musicus Soloists Hong Kong']
Process finished with exit code 0