请求和 BeautifulSoup 从 YouTube 获取视频长度

问题描述 投票:0回答:3

从 YouTube 网址获取视频长度时,从网络浏览器检查显示有一行:

enter image description here

然后我使用 requests 和 BeautifulSoup 来获取它:

import requests
from bs4 import BeautifulSoup

url = "https://www.youtube.com/watch?v=ANYyoutubeLINK"

response = requests.get(url)
response.raise_for_status()

soup = BeautifulSoup(response.text, 'html.parser')

duration_span = soup.find_all('span', class_='ytp-time-duration')

print (duration_span)

“soup.find_all”和“soup.find”都不起作用。出了什么问题?

python web-scraping beautifulsoup request youtube
3个回答
2
投票

响应中不存在您正在搜索的元素。 如果没有 JS 渲染,你将无法获得你正在寻找的信息。 在无头模式下使用selenium,你就会得到时间。 您可以使用

Beautifulsoup
或直接从
Webdriver
获取数据。

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from bs4 import BeautifulSoup

chrome_options = Options()
chrome_options.add_argument('--start-maximized')
chrome_options.add_argument("--headless")  # Run in headless mode (no GUI)

driver = webdriver.Chrome(options=chrome_options)

URL = "https://www.youtube.com/watch?v=ANYyoutubeLINK"
driver.get(URL)

#Get the time directly from webdriver
duration = driver.find_element(By.CLASS_NAME,'ytp-time-duration')
print(f"From webdriver: {duration.text}")

#Get the time using beautifulsoup
soup = BeautifulSoup(driver.page_source, 'html.parser')
duration_span = soup.find('span', class_='ytp-time-duration')
print (f"From beautifulsoup: {duration_span.text}")

#quit the webdriver
driver.quit()

输出:

From webdriver: 1:43
From beautifulsoup: 1:43

1
投票

YouTube 页面高度依赖 JavaScript。因此,requestsBeautifulSoup可能无法为您提供您尝试抓取的页面的完全渲染视图。

selenium 非常适合这种情况,因为它支持 JavaScript。

因此,对于您的特定用例:

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.webdriver.remote.webelement import WebElement

URL = "https://www.youtube.com/watch?v=GSQo5zlAe2w"  # or whatever
TIMEOUT = 5

# if the webelement is not visible, e.text may return an empty string
# if that happens, it's best to check the textContent property
def etext(e: WebElement) -> str:
    if e:
        if t := e.text.strip():
            return t
        if (p := e.get_property("textContent")) and isinstance(p, str):
            return p.strip()
    return ""

if __name__ == "__main__":
    opt = webdriver.ChromeOptions()
    opt.add_argument("--headless=new")
    with webdriver.Chrome(options=opt) as driver:
        driver.get(URL)
        wait = WebDriverWait(driver, TIMEOUT)
        ec = EC.presence_of_element_located
        sel = By.CSS_SELECTOR, "span.ytp-time-duration"
        span = wait.until(ec(sel))
        print(etext(span))

0
投票

以下是如何在不使用 selenium 的情况下从 API 获取视频长度的方法:

import requests
from urllib.parse import urlsplit, parse_qs


def get_youtube_video_length_seconds(video_link):
    video_id = parse_qs(urlsplit(video_link).query).get('v')[0]

    json_data = {
        'context': {
            'client': {
                'clientName': 'WEB',
                'clientVersion': '2.20250116.10.00',
            },
        },
        'videoId': video_id
    }

    response = requests.post('https://www.youtube.com/youtubei/v1/player', json=json_data)
    
    return response.json().get('videoDetails').get('lengthSeconds')


video_link = 'https://www.youtube.com/watch?v=dQw4w9WgXcQ'
video_length = get_youtube_video_length_seconds(video_link)
print(video_length)
© www.soinside.com 2019 - 2024. All rights reserved.