如果我访问此页面here,我可以在检查时看到带有img
标签的页面上的图像。
但是当我尝试使用requests
获取页面并使用BeautifulSoup
解析时,我无法访问相同的图像。我在这里错过了什么?
代码工作正常,我从请求中得到200作为status_code。
import requests
from bs4 import BeautifulSoup
url = 'https://mangadex.org/chapter/435396/2'
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.76 Safari/537.36'}
page = requests.get(url,headers=headers)
print(page.status_code)
soup = BeautifulSoup(page.text,'html.parser')
img_tags = soup.find_all('img')
for img in img_tags:
print(img)
编辑::
根据建议,硒选项工作正常。但有没有办法像BeautifulSoup那样加快速度呢?
您可以使用API来获取图像。下面的代码从页面获取所有图像并打印网址:
import requests
headers = {
'Accept': 'application/json, text/plain, */*',
'Referer': 'https://mangadex.org/chapter/435396/2',
'DNT': '1',
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_4) '
'AppleWebKit/537.36 (KHTML, like Gecko) '
'Chrome/73.0.3683.86 Safari/537.36',
}
params = (
('id', '435396'),
('type', 'chapter'),
('baseURL', '/api'),
)
response = requests.get('https://mangadex.org/api/', headers=headers, params=params)
data = response.json()
img_base_url = "https://s4.mangadex.org/data"
img_hash = data["hash"]
img_names = data["page_array"]
for img in img_names:
print(f"{img_base_url}/{img_hash}/{img}")
输出:
https://s4.mangadex.org/data/ac081a99e13d8765d48e55869cd5444c/x1.png https://s4.mangadex.org/data/ac081a99e13d8765d48e55869cd5444c/x2.png https://s4.mangadex.org/data/ac081a99e13d8765d48e55869cd5444c/x3.png https://s4.mangadex.org/data/ac081a99e13d8765d48e55869cd5444c/x4.png https://s4.mangadex.org/data/ac081a99e13d8765d48e55869cd5444c/x5.png https://s4.mangadex.org/data/ac081a99e13d8765d48e55869cd5444c/x6.png https://s4.mangadex.org/data/ac081a99e13d8765d48e55869cd5444c/x7.png https://s4.mangadex.org/data/ac081a99e13d8765d48e55869cd5444c/x8.png https://s4.mangadex.org/data/ac081a99e13d8765d48e55869cd5444c/x9.png https://s4.mangadex.org/data/ac081a99e13d8765d48e55869cd5444c/x10.png https://s4.mangadex.org/data/ac081a99e13d8765d48e55869cd5444c/x11.png https://s4.mangadex.org/data/ac081a99e13d8765d48e55869cd5444c/x12.png https://s4.mangadex.org/data/ac081a99e13d8765d48e55869cd5444c/x13.png https://s4.mangadex.org/data/ac081a99e13d8765d48e55869cd5444c/x14.png https://s4.mangadex.org/data/ac081a99e13d8765d48e55869cd5444c/x15.png https://s4.mangadex.org/data/ac081a99e13d8765d48e55869cd5444c/x16.png https://s4.mangadex.org/data/ac081a99e13d8765d48e55869cd5444c/x17.png https://s4.mangadex.org/data/ac081a99e13d8765d48e55869cd5444c/x18.png https://s4.mangadex.org/data/ac081a99e13d8765d48e55869cd5444c/x19.png https://s4.mangadex.org/data/ac081a99e13d8765d48e55869cd5444c/x20.png https://s4.mangadex.org/data/ac081a99e13d8765d48e55869cd5444c/x21.png https://s4.mangadex.org/data/ac081a99e13d8765d48e55869cd5444c/x22.png https://s4.mangadex.org/data/ac081a99e13d8765d48e55869cd5444c/x23.png https://s4.mangadex.org/data/ac081a99e13d8765d48e55869cd5444c/x24.png https://s4.mangadex.org/data/ac081a99e13d8765d48e55869cd5444c/x25.png https://s4.mangadex.org/data/ac081a99e13d8765d48e55869cd5444c/x26.png https://s4.mangadex.org/data/ac081a99e13d8765d48e55869cd5444c/x27.png https://s4.mangadex.org/data/ac081a99e13d8765d48e55869cd5444c/x28.png https://s4.mangadex.org/data/ac081a99e13d8765d48e55869cd5444c/x29.png https://s4.mangadex.org/data/ac081a99e13d8765d48e55869cd5444c/x30.png https://s4.mangadex.org/data/ac081a99e13d8765d48e55869cd5444c/x31.png https://s4.mangadex.org/data/ac081a99e13d8765d48e55869cd5444c/x32.png https://s4.mangadex.org/data/ac081a99e13d8765d48e55869cd5444c/x33.png https://s4.mangadex.org/data/ac081a99e13d8765d48e55869cd5444c/x34.png https://s4.mangadex.org/data/ac081a99e13d8765d48e55869cd5444c/x35.png https://s4.mangadex.org/data/ac081a99e13d8765d48e55869cd5444c/x36.png
该页面包含需要运行的JavaScript才能填充页面上的某些元素。在访问图像之前,您可以使用Selenium来运行页面的JavaScript。