使用 BeautifulSoup 从 <h1> 标签中提取文本时出现问题

Question

我正在抓取页面https://www.eloatings.net/1999，我的代码在提取元素方面工作正常，但是当我尝试仅打印

<h1>

标签内的文本时，它没有显示内容。除了这部分之外，整个代码都有效。这是我的代码：

import requests
from bs4 import BeautifulSoup

# Send a GET request to the page
url = "https://www.eloratings.net/1999"
response = requests.get(url)

# Parse the content with BeautifulSoup
soup = BeautifulSoup(response.content, 'html.parser')

# Find the main div containing the h1 tag
main_div = soup.find('div', id='main')

# Check if main_div is found
if main_div:
    h1_tag = main_div.find('h1')
    if h1_tag:
        # Print the complete content of the h1 tag
        print(f"Vollständiger Inhalt des h1-Tags: {h1_tag.get_text()}")
    else:
        print("Kein h1-Tag gefunden.")
else:
    print("Kein div mit der ID 'main' gefunden.")

虽然代码有效（除了

<h1>

中的文本），但以下提取文本的代码片段不起作用：

main_div = soup.find('div', id='main')
if main_div:
    h1_tag = main_div.find('h1')
    if h1_tag:
        print(f"Vollständiger Inhalt des h1-Tags: {h1_tag.get_text()}")

有谁知道为什么文字丢失了？另外，如果您能提供有关提取和保存整个表格的帮助，我们将不胜感激！

Answer 1

这里的主要问题是内容是由 JavaScript 动态加载的，并且不存在于由

requests

使用的服务器的静态响应中。

您可以使用

selenium

来模仿浏览器行为并渲染上下文：

from selenium import webdriver
from bs4 import BeautifulSoup
import time

driver = webdriver.Chrome()
# call the episodes of season
driver.get('https://www.eloratings.net/1999')
time.sleep(2)

soup = BeautifulSoup(driver.page_source)

print(soup.h1.get_text())

使用 BeautifulSoup 从 <h1> 标签中提取文本时出现问题

问题描述投票：0回答：1

1个回答

最新问题

使用 BeautifulSoup 从 <h1> 标签中提取文本时出现问题

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1