BeautifulSoup在html页面中不显示某些标签

问题描述 投票:3回答:2

如果我访问此页面here,我可以在检查时看到带有img标签的页面上的图像。

但是当我尝试使用requests获取页面并使用BeautifulSoup解析时,我无法访问相同的图像。我在这里错过了什么?

代码工作正常,我从请求中得到200作为status_code。

import requests
from bs4 import BeautifulSoup

url = 'https://mangadex.org/chapter/435396/2'
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.76 Safari/537.36'}

page = requests.get(url,headers=headers)
print(page.status_code)

soup = BeautifulSoup(page.text,'html.parser')
img_tags = soup.find_all('img')
for img in img_tags:
    print(img)

编辑::

根据建议,硒选项工作正常。但有没有办法像BeautifulSoup那样加快速度呢?

python python-3.x web-scraping beautifulsoup
2个回答
0
投票

您可以使用API​​来获取图像。下面的代码从页面获取所有图像并打印网址:

import requests

headers = {
    'Accept': 'application/json, text/plain, */*',
    'Referer': 'https://mangadex.org/chapter/435396/2',
    'DNT': '1',
    'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_4) '
                  'AppleWebKit/537.36 (KHTML, like Gecko) '
                  'Chrome/73.0.3683.86 Safari/537.36',
}

params = (
    ('id', '435396'),
    ('type', 'chapter'),
    ('baseURL', '/api'),
)

response = requests.get('https://mangadex.org/api/', headers=headers, params=params)
data = response.json()

img_base_url = "https://s4.mangadex.org/data"
img_hash = data["hash"]
img_names = data["page_array"]

for img in img_names:
    print(f"{img_base_url}/{img_hash}/{img}")

输出:

https://s4.mangadex.org/data/ac081a99e13d8765d48e55869cd5444c/x1.png https://s4.mangadex.org/data/ac081a99e13d8765d48e55869cd5444c/x2.png https://s4.mangadex.org/data/ac081a99e13d8765d48e55869cd5444c/x3.png https://s4.mangadex.org/data/ac081a99e13d8765d48e55869cd5444c/x4.png https://s4.mangadex.org/data/ac081a99e13d8765d48e55869cd5444c/x5.png https://s4.mangadex.org/data/ac081a99e13d8765d48e55869cd5444c/x6.png https://s4.mangadex.org/data/ac081a99e13d8765d48e55869cd5444c/x7.png https://s4.mangadex.org/data/ac081a99e13d8765d48e55869cd5444c/x8.png https://s4.mangadex.org/data/ac081a99e13d8765d48e55869cd5444c/x9.png https://s4.mangadex.org/data/ac081a99e13d8765d48e55869cd5444c/x10.png https://s4.mangadex.org/data/ac081a99e13d8765d48e55869cd5444c/x11.png https://s4.mangadex.org/data/ac081a99e13d8765d48e55869cd5444c/x12.png https://s4.mangadex.org/data/ac081a99e13d8765d48e55869cd5444c/x13.png https://s4.mangadex.org/data/ac081a99e13d8765d48e55869cd5444c/x14.png https://s4.mangadex.org/data/ac081a99e13d8765d48e55869cd5444c/x15.png https://s4.mangadex.org/data/ac081a99e13d8765d48e55869cd5444c/x16.png https://s4.mangadex.org/data/ac081a99e13d8765d48e55869cd5444c/x17.png https://s4.mangadex.org/data/ac081a99e13d8765d48e55869cd5444c/x18.png https://s4.mangadex.org/data/ac081a99e13d8765d48e55869cd5444c/x19.png https://s4.mangadex.org/data/ac081a99e13d8765d48e55869cd5444c/x20.png https://s4.mangadex.org/data/ac081a99e13d8765d48e55869cd5444c/x21.png https://s4.mangadex.org/data/ac081a99e13d8765d48e55869cd5444c/x22.png https://s4.mangadex.org/data/ac081a99e13d8765d48e55869cd5444c/x23.png https://s4.mangadex.org/data/ac081a99e13d8765d48e55869cd5444c/x24.png https://s4.mangadex.org/data/ac081a99e13d8765d48e55869cd5444c/x25.png https://s4.mangadex.org/data/ac081a99e13d8765d48e55869cd5444c/x26.png https://s4.mangadex.org/data/ac081a99e13d8765d48e55869cd5444c/x27.png https://s4.mangadex.org/data/ac081a99e13d8765d48e55869cd5444c/x28.png https://s4.mangadex.org/data/ac081a99e13d8765d48e55869cd5444c/x29.png https://s4.mangadex.org/data/ac081a99e13d8765d48e55869cd5444c/x30.png https://s4.mangadex.org/data/ac081a99e13d8765d48e55869cd5444c/x31.png https://s4.mangadex.org/data/ac081a99e13d8765d48e55869cd5444c/x32.png https://s4.mangadex.org/data/ac081a99e13d8765d48e55869cd5444c/x33.png https://s4.mangadex.org/data/ac081a99e13d8765d48e55869cd5444c/x34.png https://s4.mangadex.org/data/ac081a99e13d8765d48e55869cd5444c/x35.png https://s4.mangadex.org/data/ac081a99e13d8765d48e55869cd5444c/x36.png


1
投票

该页面包含需要运行的JavaScript才能填充页面上的某些元素。在访问图像之前,您可以使用Selenium来运行页面的JavaScript。

© www.soinside.com 2019 - 2024. All rights reserved.