我想知道如何使用 Selenium 获取动态生成的内容。前面提到的解决方案(例如,1)在我的情况下不起作用。我举个例子来说明问题。具体来说,我无法在此page中选择使用 Selenium 动态生成的元素。
这是我正在使用的示例Python代码:
from selenium import webdriver
from selenium.webdriver.common.by import By
if __name__ == '__main__':
driver = webdriver.Chrome()
driver.get("https://www.boplatssyd.se/mypages/app")
try:
html = driver.find_element(by=By.TAG_NAME, value='html').get_attribute('innerHTML')
print(html)
except Exception as e:
print("An error occurred:", e)
结果与我从 chrome 获取页面源时类似,但如果您注意到,动态元素不存在:
<!DOCTYPE html>
<html lang="sv">
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1" />
<link rel="icon" href="/mypages/app/favicon.ico" />
<link rel="apple-touch-icon" sizes="180x180" href="/mypages/app/apple-touch-icon.png" />
<link rel="icon" type="image/png" sizes="32x32" href="/mypages/app/favicon-32x32.png" />
<link rel="icon" type="image/png" sizes="16x16" href="/mypages/app/favicon-16x16.png" />
<link rel="manifest" href="/mypages/app/manifest.json" />
<link rel="mask-icon" href="/mypages/app/safari-pinned-tab.svg" color="#5bbad5" />
<meta name="msapplication-TileColor" content="#68b7d4" />
<meta name="theme-color" content="#68b7d4" />
<title>Mina sidor</title>
<meta name="description" content="Sök bland lediga bostäder i hela regionen." />
<meta property="og:title" content="Lediga bostäder" />
<meta property="og:description" content="Sök bland lediga bostäder i hela regionen." />
<meta property="og:url" content="https://www.boplatssyd.se/mypages/app/" />
<meta property="og:image" content="" />
<meta property="og:image:width" content="1200" />
<meta property="og:image:height" content="630" />
<meta property="og:type" content="website" />
<meta property="og:locale" content="sv_SE" />
<script type="module" crossorigin src="/mypages/app/assets/index-ca5ddbcc.js"></script>
<link rel="modulepreload" crossorigin href="/mypages/app/assets/@util-e6166646.js">
<link rel="modulepreload" crossorigin href="/mypages/app/assets/@bootstrap-17c847c2.js">
<link rel="modulepreload" crossorigin href="/mypages/app/assets/vendor-a4a66ed9.js">
<link rel="stylesheet" href="/mypages/app/assets/index-20b264da.css">
</head>
<body>
<noscript>You need to enable JavaScript to run this app.</noscript>
<div id="root"></div>
</body>
</html>
具体来说,动态内容应该在以下div中生成:
<div id="root"></div>
仅当我在动态生成的元素上按“检查元素”时,我才能看到动态内容。我很欣赏你对此事的见解。
您不是在等待页面完全加载,这里是如何等待特定元素,例如
footer
:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome()
driver.get('https://www.boplatssyd.se/mypages/app/')
footer = WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.TAG_NAME, 'footer')))
html = driver.page_source
print(html)