使用 Selenium 从脚本中抓取 JSON

Question

我有这个网页，我正在尝试获取这个JSON

JSON

“我认为”是由 JavaScript 注入的...因此，获取响应或 page_source 不起作用。

在该 JSON 中，有一个包含视频的 .m3u8 链接...所以我想要该链接来下载它。

目前我有这个代码：

from selenium import webdriver
from selenium.webdriver.chrome.options import Options

# options = Options()
# options.headless = True
# driver = webdriver.Chrome(options=options)
driver = webdriver.Chrome() 

driver.get('https://ed.team/clase/49/464/2199')

usuario = driver.find_element_by_name("email")
usuario.clear()
usuario.send_keys("")

contra = driver.find_element_by_name("password")
contra.clear()
contra.send_keys("")

driver.find_element_by_css_selector("#__next > main > section > form > div:nth-child(3) > input").click() #login button

我的脚本仅登录页面，仅此而已，我不知道如何继续。

如果有人知道如何帮助我，我将非常感激！谢谢！

Answer 1

如果您在问题中包含脚本元素的内容而不是它的图像，那就容易多了。但无论如何，获取该脚本元素的 HTML，然后使用

re

模块提取 JSON：

import re
import json

script_html = '''<script>

__NEXT_DATA__ = { 
   "a": "b"
};
'''
# clean up the HTML
script_html = script_html.replace('\n', ' ')

script_re = re.compile(r'__NEXT_DATA__ = ({.*})', flags=re.MULTILINE)
raw_json = script_re.search(script_html).group(1)
parsed = json.loads(raw_json)

print(raw_json)
print(parsed)

输出：

{     "a": "b" }
{'a': 'b'}

Answer 2

要使用 Selenium 和 BeautifulSoup 从网页中提取 JSON 数据，不需要

re

！

import json
from bs4 import BeautifulSoup
from selenium import webdriver

driver = webdriver.Firefox()
driver.get(url)   
# Wait for the page to load
driver.implicitly_wait(
# Parse the page source with BeautifulSoup
soup = BeautifulSoup(driver.page_source, 'html.
# Extract the JSON data from the script tag
script_data = soup.select_one("html body script#__NEXT_DATA__").
# following abdusco answer
cleaned_data = script_data.replace('\n', ' ')
json_data = json.loads(cleaned_data)

使用 Selenium 从脚本中抓取 JSON

问题描述投票：0回答：2

2个回答

最新问题

使用 Selenium 从脚本中抓取 JSON

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2