我尝试用BeautifulSoup
刮擦一张桌子,在我的前4次尝试中有3次无效,我不知道为什么!
在第四种方法中,我尝试使用pandas
,但结果不再具体。
import requests
import bs4
res = requests.get(
"https://www.moneycontrol.com/stocks/marketstats/industry-classification/bse/aerospace-defence.html")
soup = bs4.BeautifulSoup(res.text, 'lxml')
# 1st try by copy selector from inspect element
table = soup.find_all(
'#mc_content > section > section > div.clearfix.stat_container > div.columnst.FR.wbg.brdwht > div > div.bsr_table.hist_tbl_hm.PR.Ohidden')
print(table)
# 2nd try by specifically writing class by attribute method
table = soup.find_all(
'div', attrs={'class': 'bsr_table.hist_tbl_hm.PR.Ohidden'})
print(table)
# 3rd conventional style
table = soup.find('table')
table_rows = table.find('tr')
for tr in table_rows:
td = tr.find_all('td')
row = [i.text() for i in td]
print(td)
import pandas as pd
# 4th by pandas
dfs = pd.read_html(
'https://www.moneycontrol.com/stocks/marketstats/industry-classification/bse/aerospace-defence.html')
for df in dfs:
print(df)
我得到的输出:
0 Hindustan Aeron Add to Watchlist | Portfolio... ... 627.53
1 5-Day ... NaN
2 10-Day ... NaN
3 30-Day ... NaN
4 3-Day ... NaN
5 5-Day ... NaN
6 8-Day ... NaN
7 TAAL Enterprise Add to Watchlist | Portfolio... ... 135.34
8 5-Day ... NaN
9 10-Day ... NaN
10 30-Day ... NaN
11 3-Day ... NaN
12 5-Day ... NaN
13 8-Day ... NaN
14 Taneja Aerospac Add to Watchlist | Portfolio... ... 21.76
15 5-Day ... NaN
16 10-Day ... NaN
17 30-Day ... NaN
18 3-Day ... NaN
19 5-Day ... NaN
20 8-Day ... NaN
我想提取下表:
<div class="bsr_table hist_tbl_hm PR Ohidden">
<div class="data_table_ajax_loading" id="data_table_ajax_loading" style="display: none;"></div>
<table width="100%" cellspacing="0" cellpadding="0" border="0">
<thead>
<tr>
<th class="TAL" width="155" valign="top" align="left"><a href="javascript:callsort(1)" class="bl_12" style="color: #ffffff;"><b>Company Name</b></a></th>
<th width="75">Open</th>
<th width="75">High</th>
<th width="80">Low</th>
<th width="85">Last Price</th>
<th width="80">Prev Price</th>
<th width="85">Change</th>
<th width="75"><a href="javascript:callsort(2)" class="bl_12" style="color: #ffffff;"><b>% Chg</b></a></th>
<th class="PR" width="90"><span id="th_name">5 Day Performance</span>
<div class="dropdownchng"> <span class="bluarw MT10 MR2"></span>
<ul>
<li><a href="javascript:;" onclick="display('performance');">5 Day Performance</a></li>
<li><a href="javascript:;" onclick="display('vol');">Volume</a></li>
<li><a href="javascript:;" onclick="display('lc');">Lower Circuit</a></li>
<li><a href="javascript:;" onclick="display('uc');">Upper Circuit</a></li>
<li><a href="javascript:;" onclick="display('vwap');">VWAP</a></li>
<li class="stat_tblcl"><a href="javascript:;" class="">SMA </a> <span class="ln_arw"></span></li>
<div class="tog_cont_bse" style="display: none;">
<div class="bg_blu">
<div class="clearfix bxtxt"><span class="or_bul FL"></span>
<div class="Ohidden"><a href="javascript:;" onclick="display('30d');">30 DMA</a></div>
</div>
<div class="clearfix bxtxt"><span class="or_bul FL"></span>
<div class="Ohidden"><a href="javascript:;" onclick="display('50d');">50 DMA</a></div>
</div>
<div class="clearfix bxtxt"><span class="or_bul FL"></span>
<div class="Ohidden"><a href="javascript:;" onclick="display('150d');">150 DMA</a></div>
</div>
<div class="clearfix bxtxt"><span class="or_bul FL"></span>
<div class="Ohidden"><a href="javascript:;" onclick="display('200d');">200 DMA</a></div>
</div>
</div>
</div>
<li><a href="javascript:;" onclick="display('del');">Deliverables</a></li>
<li><a href="javascript:;" onclick="display('pe');">P/E</a></li>
<li><a href="javascript:;" onclick="display('pb');">P/B</a></li>
</ul>
</div>
</th>
</tr>
</thead>
<tbody>
<tr>
<td class="PR" width="155"><span class="gld13 disin"><a href="https://www.moneycontrol.com/india/stockpricequote/diversified/hindustanaeronauticsltd/HAL" style="color:#333">Hindustan Aeron</a>
<div class="disin PR tipshow"><span class="ic_plusor"></span>
<div class="stagetool wd190 dnone">
<div class="PA5 CTR op_gl12">Add to</div>
<div class="add_bubg CTR"> <a href="javascript:;" onclick="javascript:chkbx_val('HAL','1');" class="bl13 watch"><span class="ic_watchlist"></span>Watchlist</a> | <a href="javascript:;" onclick="javascript:chkbx_val('HAL','5');" class="bl13 port"><span class="ic_portfolio"></span>Portfolio</a> </div>
<span class="arrwodn"></span> </div>
</div>
</span> <span class="ic_tradenwicn btn_tradep_pop ML2" onclick="tradepopup('INE066F01012','BSE','541154');"></span><div class="MT5"><div class="disin PR tolhov">
<span class="ic_graphsp ML5"></span>
<div class="stagetool tooltip1 PB5 dnone">
<h1>ACTIONS</h1>
<div class="PA10">
<ul><li><span>Hindustan Aeron closes above 200-Day Moving Average of 723.47 today.</span></li></ul>
</div>
<span style="left:2px;" class="arrwodn"></span>
</div>
</div></div></td>
<td width="75" align="right">637.40</td>
<td width="75" align="right">637.40</td>
<td width="80" align="right">615.00</td>
<td width="85" align="right">620.60</td>
<td width="80" align="right">643.90</td>
<td class="red" width="85" align="right">-23.30</td>
<td class="red" width="75" align="right">-3.62</td>
<td class="vol" style="display: none;" width="90" align="right"><div class="stp_info">
<span class="FR icon_info"></span>
<div class="tooltist_cnt">
<div class="tooltip2">
<div class="title2 MB5 TAC">AVERAGE VOLUME</div>
<table width="100%" cellspacing="0" cellpadding="0" border="0">
<tbody><tr>
<td>5-Day</td>
<td align="right"><strong>2990.80</strong></td>
</tr>
<tr>
<td>10-Day</td>
<td align="right"><strong>4437.40</strong></td>
</tr>
<tr>
<td>30-Day</td>
<td align="right"><strong>3791.73</strong></td>
</tr>
</tbody></table>
</div>
</div>
</div>4027</td>
<td class="30d" style="display: none;" width="90" align="right">745.38</td>
<td class="50d" style="display: none;" width="90" align="right">762.54</td>
<td class="150d" style="display: none;" width="90" align="right">735.14</td>
<td class="200d" style="display: none;" width="90" align="right">724.06</td>
<td class="pe" style="display: none;" width="90" align="right">7.43</td>
<td class="pb" style="display: none;" width="90" align="right">1.91</td>
<td class="performance" width="90" align="right"><div class="changea red"> <span style="left:-3px; bottom:5px;" class="arrwodn"></span>
<div class="dypop">
<p class="op12_gy">02-Mar-20</p>
<p class="blktxt13"><strong>666.10</strong> <span class="redarw15"></span> <span class="">-18.45 (-2.7%)</span></p>
</div>
</div><div class="changea "> <span style="left:-3px; bottom:5px;" class="arrwodn"></span>
<div class="dypop">
<p class="op12_gy">03-Mar-20</p>
<p class="blktxt13"><strong>672.85</strong> <span class="grnarw14"></span> <span class="">6.75 (1.01%)</span></p>
</div>
</div><div class="changea red"> <span style="left:-3px; bottom:5px;" class="arrwodn"></span>
<div class="dypop">
<p class="op12_gy">04-Mar-20</p>
<p class="blktxt13"><strong>644.90</strong> <span class="redarw15"></span> <span class="">-27.95 (-4.15%)</span></p>
</div>
</div><div class="changea red"> <span style="left:-3px; bottom:5px;" class="arrwodn"></span>
<div class="dypop">
<p class="op12_gy">05-Mar-20</p>
<p class="blktxt13"><strong>643.90</strong> <span class="redarw15"></span> <span class="">-1 (-0.16%)</span></p>
</div>
</div><div class="changea red"> <span style="left:-3px; bottom:5px;" class="arrwodn"></span>
<div class="dypop">
<p class="op12_gy">06-Mar-20</p>
<p class="blktxt13"><strong>620.60</strong> <span class="redarw15"></span> <span class="">-23.3 (-3.62%)</span></p>
</div>
</div></td>
<td class="del" style="display: none;" width="90" align="right"><div class="stp_info">
<span class="FR icon_info"></span>
<div class="tooltist_cnt">
<div class="tooltip2">
<div class="title2 MB5 TAC">DELIVERY AVERAGES</div>
<table width="100%" cellspacing="0" cellpadding="0" border="0">
<tbody><tr>
<td>3-Day</td>
<td align="right"><strong>44.97%</strong></td>
</tr>
<tr>
<td>5-Day</td>
<td align="right"><strong>47.74%</strong></td>
</tr>
<tr>
<td>8-Day</td>
<td align="right"><strong>29.57%</strong></td>
</tr>
</tbody></table>
</div>
</div>
</div>58.87</td>
<td class="uc" style="display: none;" width="90" align="right">7.73</td>
<td class="lc" style="display: none;" width="90" align="right">5.15</td>
<td class="vwap" style="display: none;" width="90" align="right">627.53</td>
</tr>
<tr>
<td class="PR" width="155"><span class="gld13 disin"><a href="https://www.moneycontrol.com/india/stockpricequote/transportlogistics/taalenterprises/TAA01" style="color:#333">TAAL Enterprise</a>
<div class="disin PR tipshow"><span class="ic_plusor"></span>
<div class="stagetool wd190 dnone">
<div class="PA5 CTR op_gl12">Add to</div>
<div class="add_bubg CTR"> <a href="javascript:;" onclick="javascript:chkbx_val('TAA01','1');" class="bl13 watch"><span class="ic_watchlist"></span>Watchlist</a> | <a href="javascript:;" onclick="javascript:chkbx_val('TAA01','5');" class="bl13 port"><span class="ic_portfolio"></span>Portfolio</a> </div>
<span class="arrwodn"></span> </div>
</div>
</span> <span class="ic_tradenwicn btn_tradep_pop ML2" onclick="tradepopup('INE524T01011','BSE','539956');"></span><div class="MT5"><div class="disin PR tolhov">
<span class="ic_graphsp ML5"></span>
<div class="stagetool tooltip1 PB5 dnone">
<h1>ACTIONS</h1>
<div class="PA10">
<ul><li><span>TAAL Enterprise has hit 52wk low of Rs 145.10 on BSE</span></li></ul>
</div>
<span style="left:2px;" class="arrwodn"></span>
</div>
</div></div></td>
<td width="75" align="right">140.00</td>
<td width="75" align="right">140.00</td>
<td width="80" align="right">130.00</td>
<td width="85" align="right">139.20</td>
<td width="80" align="right">150.90</td>
<td class="red" width="85" align="right">-11.70</td>
<td class="red" width="75" align="right">-7.75</td>
<td class="vol" style="display: none;" width="90" align="right"><div class="stp_info">
<span class="FR icon_info"></span>
<div class="tooltist_cnt">
<div class="tooltip2">
<div class="title2 MB5 TAC">AVERAGE VOLUME</div>
<table width="100%" cellspacing="0" cellpadding="0" border="0">
<tbody><tr>
<td>5-Day</td>
<td align="right"><strong>1484.00</strong></td>
</tr>
<tr>
<td>10-Day</td>
<td align="right"><strong>1577.40</strong></td>
</tr>
<tr>
<td>30-Day</td>
<td align="right"><strong>2017.33</strong></td>
</tr>
</tbody></table>
</div>
</div>
</div>1247</td>
<td class="30d" style="display: none;" width="90" align="right">193.44</td>
<td class="50d" style="display: none;" width="90" align="right">191.63</td>
<td class="150d" style="display: none;" width="90" align="right">205.98</td>
<td class="200d" style="display: none;" width="90" align="right">221.23</td>
<td class="pe" style="display: none;" width="90" align="right">9.96</td>
<td class="pb" style="display: none;" width="90" align="right">2.52</td>
<td class="performance" width="90" align="right"><div class="changea red"> <span style="left:-3px; bottom:5px;" class="arrwodn"></span>
<div class="dypop">
<p class="op12_gy">02-Mar-20</p>
<p class="blktxt13"><strong>160.00</strong> <span class="redarw15"></span> <span class="">-0.3 (-0.19%)</span></p>
</div>
</div><div class="changea red"> <span style="left:-3px; bottom:5px;" class="arrwodn"></span>
<div class="dypop">
<p class="op12_gy">03-Mar-20</p>
<p class="blktxt13"><strong>157.05</strong> <span class="redarw15"></span> <span class="">-2.95 (-1.84%)</span></p>
</div>
</div><div class="changea red"> <span style="left:-3px; bottom:5px;" class="arrwodn"></span>
<div class="dypop">
<p class="op12_gy">04-Mar-20</p>
<p class="blktxt13"><strong>142.40</strong> <span class="redarw15"></span> <span class="">-14.65 (-9.33%)</span></p>
</div>
</div><div class="changea "> <span style="left:-3px; bottom:5px;" class="arrwodn"></span>
<div class="dypop">
<p class="op12_gy">05-Mar-20</p>
<p class="blktxt13"><strong>150.90</strong> <span class="grnarw14"></span> <span class="">8.5 (5.97%)</span></p>
</div>
</div><div class="changea red"> <span style="left:-3px; bottom:5px;" class="arrwodn"></span>
<div class="dypop">
<p class="op12_gy">06-Mar-20</p>
<p class="blktxt13"><strong>139.20</strong> <span class="redarw15"></span> <span class="">-11.7 (-7.75%)</span></p>
</div>
</div></td>
<td class="del" style="display: none;" width="90" align="right"><div class="stp_info">
<span class="FR icon_info"></span>
<div class="tooltist_cnt">
<div class="tooltip2">
<div class="title2 MB5 TAC">DELIVERY AVERAGES</div>
<table width="100%" cellspacing="0" cellpadding="0" border="0">
<tbody><tr>
<td>3-Day</td>
<td align="right"><strong>74.66%</strong></td>
</tr>
<tr>
<td>5-Day</td>
<td align="right"><strong>71.20%</strong></td>
</tr>
<tr>
<td>8-Day</td>
<td align="right"><strong>74.17%</strong></td>
</tr>
</tbody></table>
</div>
</div>
</div>82.88</td>
<td class="uc" style="display: none;" width="90" align="right">1.81</td>
<td class="lc" style="display: none;" width="90" align="right">1.21</td>
<td class="vwap" style="display: none;" width="90" align="right">135.34</td>
</tr>
<tr>
<td class="PR" width="155"><span class="gld13 disin"><a href="https://www.moneycontrol.com/india/stockpricequote/miscellaneous/tanejaaerospaceaviation/TAA" style="color:#333">Taneja Aerospac</a>
<div class="disin PR tipshow"><span class="ic_plusor"></span>
<div class="stagetool wd190 dnone">
<div class="PA5 CTR op_gl12">Add to</div>
<div class="add_bubg CTR"> <a href="javascript:;" onclick="javascript:chkbx_val('TAA','1');" class="bl13 watch"><span class="ic_watchlist"></span>Watchlist</a> | <a href="javascript:;" onclick="javascript:chkbx_val('TAA','5');" class="bl13 port"><span class="ic_portfolio"></span>Portfolio</a> </div>
<span class="arrwodn"></span> </div>
</div>
</span> <span class="ic_tradenwicn btn_tradep_pop ML2" onclick="tradepopup('INE692C01020','BSE','522229');"></span><div class="MT5"><div class="disin PR tolhov">
<span class="ic_graphsp ML5"></span>
<div class="stagetool tooltip1 PB5 dnone">
<h1>ACTIONS</h1>
<div class="PA10">
<ul><li><span>Only Buyers in Taneja Aerospac on BSE</span></li></ul>
</div>
<span style="left:2px;" class="arrwodn"></span>
</div>
</div></div></td>
<td width="75" align="right">22.10</td>
<td width="75" align="right">22.60</td>
<td width="80" align="right">20.35</td>
<td width="85" align="right">21.95</td>
<td width="80" align="right">23.10</td>
<td class="red" width="85" align="right">-1.15</td>
<td class="red" width="75" align="right">-4.98</td>
<td class="vol" style="display: none;" width="90" align="right"><div class="stp_info">
<span class="FR icon_info"></span>
<div class="tooltist_cnt">
<div class="tooltip2">
<div class="title2 MB5 TAC">AVERAGE VOLUME</div>
<table width="100%" cellspacing="0" cellpadding="0" border="0">
<tbody><tr>
<td>5-Day</td>
<td align="right"><strong>9890.00</strong></td>
</tr>
<tr>
<td>10-Day</td>
<td align="right"><strong>10049.80</strong></td>
</tr>
<tr>
<td>30-Day</td>
<td align="right"><strong>24562.07</strong></td>
</tr>
</tbody></table>
</div>
</div>
</div>18040</td>
<td class="30d" style="display: none;" width="90" align="right">26.99</td>
<td class="50d" style="display: none;" width="90" align="right">26.55</td>
<td class="150d" style="display: none;" width="90" align="right">23.85</td>
<td class="200d" style="display: none;" width="90" align="right">24.70</td>
<td class="pe" style="display: none;" width="90" align="right">8.01</td>
<td class="pb" style="display: none;" width="90" align="right">0.59</td>
<td class="performance" width="90" align="right"><div class="changea "> <span style="left:-3px; bottom:5px;" class="arrwodn"></span>
<div class="dypop">
<p class="op12_gy">02-Mar-20</p>
<p class="blktxt13"><strong>24.90</strong> <span class="grnarw14"></span> <span class="">0.55 (2.26%)</span></p>
</div>
</div><div class="changea red"> <span style="left:-3px; bottom:5px;" class="arrwodn"></span>
<div class="dypop">
<p class="op12_gy">03-Mar-20</p>
<p class="blktxt13"><strong>24.40</strong> <span class="redarw15"></span> <span class="">-0.5 (-2.01%)</span></p>
</div>
</div><div class="changea red"> <span style="left:-3px; bottom:5px;" class="arrwodn"></span>
<div class="dypop">
<p class="op12_gy">04-Mar-20</p>
<p class="blktxt13"><strong>23.55</strong> <span class="redarw15"></span> <span class="">-0.85 (-3.48%)</span></p>
</div>
</div><div class="changea red"> <span style="left:-3px; bottom:5px;" class="arrwodn"></span>
<div class="dypop">
<p class="op12_gy">05-Mar-20</p>
<p class="blktxt13"><strong>23.10</strong> <span class="redarw15"></span> <span class="">-0.45 (-1.91%)</span></p>
</div>
</div><div class="changea red"> <span style="left:-3px; bottom:5px;" class="arrwodn"></span>
<div class="dypop">
<p class="op12_gy">06-Mar-20</p>
<p class="blktxt13"><strong>21.95</strong> <span class="redarw15"></span> <span class="">-1.15 (-4.98%)</span></p>
</div>
</div></td>
<td class="del" style="display: none;" width="90" align="right"><div class="stp_info">
<span class="FR icon_info"></span>
<div class="tooltist_cnt">
<div class="tooltip2">
<div class="title2 MB5 TAC">DELIVERY AVERAGES</div>
<table width="100%" cellspacing="0" cellpadding="0" border="0">
<tbody><tr>
<td>3-Day</td>
<td align="right"><strong>84.25%</strong></td>
</tr>
<tr>
<td>5-Day</td>
<td align="right"><strong>83.57%</strong></td>
</tr>
<tr>
<td>8-Day</td>
<td align="right"><strong>81.94%</strong></td>
</tr>
</tbody></table>
</div>
</div>
</div>71.70</td>
<td class="uc" style="display: none;" width="90" align="right">0.28</td>
<td class="lc" style="display: none;" width="90" align="right">0.19</td>
<td class="vwap" style="display: none;" width="90" align="right">21.76</td>
</tr>
</tbody>
</table>
</div>
基本上,页面是通过JavaScript
加载的,因此一旦页面加载,就无法使用requests
模块动态解析JS
的rendered
。
您可以将selenium
用于此类任务。否则,您可以使用HTMLSession
模块中的requests_html
进行操作,从而实时显示JavaScript
。
from selenium import webdriver
from selenium.webdriver.firefox.options import Options
import pandas as pd
options = Options()
options.add_argument('--headless')
driver = webdriver.Firefox(options=options)
driver.get("https://www.moneycontrol.com/stocks/marketstats/industry-classification/bse/aerospace-defence.html")
df = pd.read_html(driver.page_source)[0]
print(df)
df.to_csv("result.csv", index=False)
driver.quit()
输出:VIEW-ONLINE
Company Name ... 5 Day Performance -18.45 (-2.7%) 03-Mar-20 6... 5 Day Performance Volume Lower Circuit Upper Circuit VWAP SMA Deliver -0.3 (-0.19%) 03-Mar-20 15...ables P/E P/B 0.55 (2.26%) 03-Mar-20 24.4...
0 Hindustan Aeron Add to Watchlist | Portfolio... ... 02-Mar-20 666.10 -18.45 (-2.7%) 03-Mar-20 6...
1 TAAL Enterprise Add to Watchlist | Portfolio... ... 02-Mar-20 160.00 -0.3 (-0.19%) 03-Mar-20 15...
2 Taneja Aerospac Add to Watchlist | Portfolio... ... 02-Mar-20 24.90
0.55 (2.26%) 03-Mar-20 24.4...
[3 rows x 9 columns]