从 YouTube 网址获取视频长度时,从网络浏览器检查显示有一行:
然后我使用 requests 和 BeautifulSoup 来获取它:
import requests
from bs4 import BeautifulSoup
url = "https://www.youtube.com/watch?v=ANYyoutubeLINK"
response = requests.get(url)
response.raise_for_status()
soup = BeautifulSoup(response.text, 'html.parser')
duration_span = soup.find_all('span', class_='ytp-time-duration')
print (duration_span)
“soup.find_all”和“soup.find”都不起作用。出了什么问题?
响应中不存在您正在搜索的元素。 如果没有 JS 渲染,你将无法获得你正在寻找的信息。 在无头模式下使用selenium,你就会得到时间。 您可以使用
Beautifulsoup
或直接从Webdriver
获取数据。
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from bs4 import BeautifulSoup
chrome_options = Options()
chrome_options.add_argument('--start-maximized')
chrome_options.add_argument("--headless") # Run in headless mode (no GUI)
driver = webdriver.Chrome(options=chrome_options)
URL = "https://www.youtube.com/watch?v=ANYyoutubeLINK"
driver.get(URL)
#Get the time directly from webdriver
duration = driver.find_element(By.CLASS_NAME,'ytp-time-duration')
print(f"From webdriver: {duration.text}")
#Get the time using beautifulsoup
soup = BeautifulSoup(driver.page_source, 'html.parser')
duration_span = soup.find('span', class_='ytp-time-duration')
print (f"From beautifulsoup: {duration_span.text}")
#quit the webdriver
driver.quit()
输出:
From webdriver: 1:43
From beautifulsoup: 1:43
YouTube 页面高度依赖 JavaScript。因此,requests和BeautifulSoup可能无法为您提供您尝试抓取的页面的完全渲染视图。
selenium 非常适合这种情况,因为它支持 JavaScript。
因此,对于您的特定用例:
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.webdriver.remote.webelement import WebElement
URL = "https://www.youtube.com/watch?v=GSQo5zlAe2w" # or whatever
TIMEOUT = 5
# if the webelement is not visible, e.text may return an empty string
# if that happens, it's best to check the textContent property
def etext(e: WebElement) -> str:
if e:
if t := e.text.strip():
return t
if (p := e.get_property("textContent")) and isinstance(p, str):
return p.strip()
return ""
if __name__ == "__main__":
opt = webdriver.ChromeOptions()
opt.add_argument("--headless=new")
with webdriver.Chrome(options=opt) as driver:
driver.get(URL)
wait = WebDriverWait(driver, TIMEOUT)
ec = EC.presence_of_element_located
sel = By.CSS_SELECTOR, "span.ytp-time-duration"
span = wait.until(ec(sel))
print(etext(span))
以下是如何在不使用 selenium 的情况下从 API 获取视频长度的方法:
import requests
from urllib.parse import urlsplit, parse_qs
def get_youtube_video_length_seconds(video_link):
video_id = parse_qs(urlsplit(video_link).query).get('v')[0]
json_data = {
'context': {
'client': {
'clientName': 'WEB',
'clientVersion': '2.20250116.10.00',
},
},
'videoId': video_id
}
response = requests.post('https://www.youtube.com/youtubei/v1/player', json=json_data)
return response.json().get('videoDetails').get('lengthSeconds')
video_link = 'https://www.youtube.com/watch?v=dQw4w9WgXcQ'
video_length = get_youtube_video_length_seconds(video_link)
print(video_length)