我试图从网页“ https://finance.yahoo.com/quote/1928.HK/financials?p=1928.HK”中提取“ Reported EPS Basic”。运行我的代码后,数据0.23、0.2以跟随格式显示,如何从跟随源代码中提取这些数字?
“ div class =” D(tbc)Ta(end)Pstart(6px)Pend(4px)Bxz(bb)Py(8px)BdB Bdc($ seperatorColor)Miw(100px)Miw(156px)-pnclg“数据-test =“ fin-col” data-reactid =“ 292”> 0.23
div class =“ D(tbc)Ta(end)Pstart(6px)Pend(4px)Bxz(bb)Py(8px)BdB Bdc($ seperatorColor)Miw(100px)Miw(156px)-pnclg Bgc ($ lv1BgColor)fi-row:h_Bgc($ hoverBgColor)“ data-test =” fin-col“ data-reactid =” 293“> 0.20
我的代码:
url="https://finance.yahoo.com/quote/1928.HK/financials?p=1928.HK"
result = requests.get(url)
result.raise_for_status()
result.encoding = "utf-8"
src = result.content
soup = BeautifulSoup(src, 'lxml')
#soup = BeautifulSoup(src, 'html5lib')
#print(soup.prettify())
print(soup)
with open('soup.txt','w') as f:
f.write(str(src))
尝试一下,
import requests
import bs4
url = 'https://finance.yahoo.com/quote/1928.HK/financials?p=1928.HK'
data = requests.get(url)
soup = bs4.BeautifulSoup(data.text,'html.parser')
soup.find_all('div',attrs={"data-reactid":"292"})[0].text
soup.find_all('div',attrs={"data-reactid":"293"})[0].text