如何使用python中的beautifulsoup从“span”标签中使用“data-reactid”进行网络抓取？

Question

我试图从雅虎财经中提取股票的实时价格数据。此信息包含在“span”标记中，其中包含“class”和“data-reactid”。我无法从此span标记中提取信息。

当我输入我的代码时，我没有得到任何输出，也没有任何错误。

我已经尝试了几乎所有其他问题的答案，但没有一个对我有用。

<--HTML Code-->
<span class="Trsdu(0.3s) Fw(b) Fz(36px) Mb(-4px) D(ib)" data-reactid="34">197.00</span>

#Python Script
my_url = "https://finance.yahoo.com/quote/AAPL?p=AAPL&.tsrc=fin-srch"
u_client = u_req(my_url)

page_html = u_client.read()
u_client.close()

page_soup = soup(page_html, "html.parser")
container = page_soup.find('span', {"data-reactid":'34'})

我想得到“197.00”（股票的实时价格）的输出作为输出。

Answer 1

在读取url时，数据重新连接以某种方式更改为14。

page_soup = soup(page_html, "html.parser")
container = page_soup.find('span', {"data-reactid":'14'})
if container:
    print(container.text)

Answer 2

鉴于data-reactid可以改变，我会使用一个唯一的类来选择。按类选择也更快。

import requests
from bs4 import BeautifulSoup as bs
r = requests.get('https://finance.yahoo.com/quote/AAPL/')
soup = bs(r.content, 'lxml')
print(soup.select_one('.Mb\(-4px\)').text)

Answer 3

我在chrome中打开了URL并按了F12。单击网络选项卡会从页面中显示此查询：https://query1.finance.yahoo.com/v8/finance/chart/AAPL?region=US&lang=en-US&includePrePost=false&interval=2m&range=1d&corsDomain=finance.yahoo.com&.tsrc=finance

我建议探索底层的AJAX调用，它们看起来呈现格式良好的JSON结果，并查看URL可以修改的一些参数。

Answer 4

你可以通过多种方式获取它。这是其中之一：

import requests
from bs4 import BeautifulSoup

res = requests.get('https://finance.yahoo.com/quote/AAPL')
soup = BeautifulSoup(res.text, 'lxml')
price = soup.select_one('#quote-market-notice').find_all_previous()[2].text
print(price)

其他方式：

price = soup.select_one("[class*='smartphone_Mt'] span").text
print(price)

如何使用python中的beautifulsoup从“span”标签中使用“data-reactid”进行网络抓取？

问题描述投票：3回答：4

4个回答

最新问题

如何使用python中的beautifulsoup从“span”标签中使用“data-reactid”进行网络抓取？

问题描述 投票：3回答：4

4个回答

最新问题

问题描述投票：3回答：4