关于美丽汤有什么错误?

问题描述 投票:0回答:1

我尝试用BeautifulSoup刮擦一张桌子,在我的前4次尝试中有3次无效,我不知道为什么!

在第四种方法中,我尝试使用pandas,但结果不再具体。

import requests
import bs4

res = requests.get(
    "https://www.moneycontrol.com/stocks/marketstats/industry-classification/bse/aerospace-defence.html")

soup = bs4.BeautifulSoup(res.text, 'lxml')

# 1st try by copy selector from inspect element
table = soup.find_all(
    '#mc_content > section > section > div.clearfix.stat_container > div.columnst.FR.wbg.brdwht > div > div.bsr_table.hist_tbl_hm.PR.Ohidden')
print(table)
# 2nd try by specifically writing class by attribute method
table = soup.find_all(
    'div', attrs={'class': 'bsr_table.hist_tbl_hm.PR.Ohidden'})
print(table)
# 3rd conventional style
table = soup.find('table')
table_rows = table.find('tr')
for tr in table_rows:
        td = tr.find_all('td')
        row = [i.text() for i in td]
        print(td)
import pandas as pd

# 4th by pandas

dfs = pd.read_html(
    'https://www.moneycontrol.com/stocks/marketstats/industry-classification/bse/aerospace-defence.html')
for df in dfs:
        print(df)

我得到的输出:

0   Hindustan Aeron  Add to  Watchlist | Portfolio...  ...      627.53
1                                               5-Day  ...         NaN
2                                              10-Day  ...         NaN
3                                              30-Day  ...         NaN
4                                               3-Day  ...         NaN
5                                               5-Day  ...         NaN
6                                               8-Day  ...         NaN
7   TAAL Enterprise  Add to  Watchlist | Portfolio...  ...      135.34
8                                               5-Day  ...         NaN
9                                              10-Day  ...         NaN
10                                             30-Day  ...         NaN
11                                              3-Day  ...         NaN
12                                              5-Day  ...         NaN
13                                              8-Day  ...         NaN
14  Taneja Aerospac  Add to  Watchlist | Portfolio...  ...       21.76
15                                              5-Day  ...         NaN
16                                             10-Day  ...         NaN
17                                             30-Day  ...         NaN
18                                              3-Day  ...         NaN
19                                              5-Day  ...         NaN
20                                              8-Day  ...         NaN

我想提取下表:

<div class="bsr_table hist_tbl_hm PR Ohidden">
				<div class="data_table_ajax_loading" id="data_table_ajax_loading" style="display: none;"></div>
              <table width="100%" cellspacing="0" cellpadding="0" border="0">
                <thead>
                  <tr>
                    <th class="TAL" width="155" valign="top" align="left"><a href="javascript:callsort(1)" class="bl_12" style="color: #ffffff;"><b>Company Name</b></a></th>
                    <th width="75">Open</th>
                    <th width="75">High</th>
                    <th width="80">Low</th>
                    <th width="85">Last Price</th>
                    <th width="80">Prev Price</th>
                    <th width="85">Change</th>
                    <th width="75"><a href="javascript:callsort(2)" class="bl_12" style="color: #ffffff;"><b>% Chg</b></a></th>
					<th class="PR" width="90"><span id="th_name">5 Day Performance</span>
                      <div class="dropdownchng"> <span class="bluarw MT10 MR2"></span>
                        <ul>
                          <li><a href="javascript:;" onclick="display('performance');">5 Day Performance</a></li>
                          <li><a href="javascript:;" onclick="display('vol');">Volume</a></li>
                          <li><a href="javascript:;" onclick="display('lc');">Lower Circuit</a></li>
                          <li><a href="javascript:;" onclick="display('uc');">Upper Circuit</a></li>
                          <li><a href="javascript:;" onclick="display('vwap');">VWAP</a></li>
                          <li class="stat_tblcl"><a href="javascript:;" class="">SMA </a> <span class="ln_arw"></span></li>
                          <div class="tog_cont_bse" style="display: none;">
                            <div class="bg_blu">
                              <div class="clearfix bxtxt"><span class="or_bul FL"></span>
                                <div class="Ohidden"><a href="javascript:;" onclick="display('30d');">30 DMA</a></div>
                              </div>
                              <div class="clearfix bxtxt"><span class="or_bul FL"></span>
                                <div class="Ohidden"><a href="javascript:;" onclick="display('50d');">50 DMA</a></div>
                              </div>
                              <div class="clearfix bxtxt"><span class="or_bul FL"></span>
                                <div class="Ohidden"><a href="javascript:;" onclick="display('150d');">150 DMA</a></div>
                              </div>
                              <div class="clearfix bxtxt"><span class="or_bul FL"></span>
                                <div class="Ohidden"><a href="javascript:;" onclick="display('200d');">200 DMA</a></div>
                              </div>
                            </div>
                          </div>
                          <li><a href="javascript:;" onclick="display('del');">Deliverables</a></li>
                          <li><a href="javascript:;" onclick="display('pe');">P/E</a></li>
                          <li><a href="javascript:;" onclick="display('pb');">P/B</a></li>
                        </ul>
                      </div>
					</th>
                  </tr>
                </thead>
                <tbody>
				                  <tr>
                    <td class="PR" width="155"><span class="gld13 disin"><a href="https://www.moneycontrol.com/india/stockpricequote/diversified/hindustanaeronauticsltd/HAL" style="color:#333">Hindustan Aeron</a>
					                      <div class="disin PR tipshow"><span class="ic_plusor"></span>
                        <div class="stagetool wd190 dnone">
                          <div class="PA5 CTR op_gl12">Add to</div>
                          <div class="add_bubg CTR"> <a href="javascript:;" onclick="javascript:chkbx_val('HAL','1');" class="bl13 watch"><span class="ic_watchlist"></span>Watchlist</a> | <a href="javascript:;" onclick="javascript:chkbx_val('HAL','5');" class="bl13 port"><span class="ic_portfolio"></span>Portfolio</a> </div>
                          <span class="arrwodn"></span> </div>
                      </div>
					                        </span> <span class="ic_tradenwicn btn_tradep_pop ML2" onclick="tradepopup('INE066F01012','BSE','541154');"></span><div class="MT5"><div class="disin PR tolhov">
						<span class="ic_graphsp ML5"></span>
							<div class="stagetool tooltip1 PB5 dnone">
								<h1>ACTIONS</h1>
								<div class="PA10">
								  <ul><li><span>Hindustan Aeron closes above 200-Day Moving Average of 723.47 today.</span></li></ul>
						</div>
							<span style="left:2px;" class="arrwodn"></span> 
						</div>
					</div></div></td>
                    <td width="75" align="right">637.40</td>
                    <td width="75" align="right">637.40</td>
                    <td width="80" align="right">615.00</td>
                    <td width="85" align="right">620.60</td>
                    <td width="80" align="right">643.90</td>
                    <td class="red" width="85" align="right">-23.30</td>
                    <td class="red" width="75" align="right">-3.62</td>
											<td class="vol" style="display: none;" width="90" align="right"><div class="stp_info">
								  <span class="FR icon_info"></span>
									<div class="tooltist_cnt">
									  <div class="tooltip2">
										<div class="title2 MB5 TAC">AVERAGE VOLUME</div>
										<table width="100%" cellspacing="0" cellpadding="0" border="0">
											  <tbody><tr>
												<td>5-Day</td>
												<td align="right"><strong>2990.80</strong></td>
											  </tr>
											  <tr>
												<td>10-Day</td>
												<td align="right"><strong>4437.40</strong></td>
											  </tr>
											  <tr>
												<td>30-Day</td>
												<td align="right"><strong>3791.73</strong></td>
											  </tr>
										  </tbody></table>
									  </div>
									 </div>
								</div>4027</td>
											<td class="30d" style="display: none;" width="90" align="right">745.38</td>
											<td class="50d" style="display: none;" width="90" align="right">762.54</td>
											<td class="150d" style="display: none;" width="90" align="right">735.14</td>
											<td class="200d" style="display: none;" width="90" align="right">724.06</td>
											<td class="pe" style="display: none;" width="90" align="right">7.43</td>
											<td class="pb" style="display: none;" width="90" align="right">1.91</td>
											<td class="performance" width="90" align="right"><div class="changea red"> <span style="left:-3px; bottom:5px;" class="arrwodn"></span>
						<div class="dypop">
						  <p class="op12_gy">02-Mar-20</p>
						  <p class="blktxt13"><strong>666.10</strong> <span class="redarw15"></span> <span class="">-18.45 (-2.7%)</span></p>
						</div>
					</div><div class="changea "> <span style="left:-3px; bottom:5px;" class="arrwodn"></span>
						<div class="dypop">
						  <p class="op12_gy">03-Mar-20</p>
						  <p class="blktxt13"><strong>672.85</strong> <span class="grnarw14"></span> <span class="">6.75 (1.01%)</span></p>
						</div>
					</div><div class="changea red"> <span style="left:-3px; bottom:5px;" class="arrwodn"></span>
						<div class="dypop">
						  <p class="op12_gy">04-Mar-20</p>
						  <p class="blktxt13"><strong>644.90</strong> <span class="redarw15"></span> <span class="">-27.95 (-4.15%)</span></p>
						</div>
					</div><div class="changea red"> <span style="left:-3px; bottom:5px;" class="arrwodn"></span>
						<div class="dypop">
						  <p class="op12_gy">05-Mar-20</p>
						  <p class="blktxt13"><strong>643.90</strong> <span class="redarw15"></span> <span class="">-1 (-0.16%)</span></p>
						</div>
					</div><div class="changea red"> <span style="left:-3px; bottom:5px;" class="arrwodn"></span>
						<div class="dypop">
						  <p class="op12_gy">06-Mar-20</p>
						  <p class="blktxt13"><strong>620.60</strong> <span class="redarw15"></span> <span class="">-23.3 (-3.62%)</span></p>
						</div>
					</div></td>
											<td class="del" style="display: none;" width="90" align="right"><div class="stp_info">
							  <span class="FR icon_info"></span>
								<div class="tooltist_cnt">
								  <div class="tooltip2">
									<div class="title2 MB5 TAC">DELIVERY AVERAGES</div>
									<table width="100%" cellspacing="0" cellpadding="0" border="0">
										 
										  <tbody><tr>
											<td>3-Day</td>
											<td align="right"><strong>44.97%</strong></td>
										  </tr>
										  
										  <tr>
											<td>5-Day</td>
											<td align="right"><strong>47.74%</strong></td>
										  </tr>
										  
										  <tr>
											<td>8-Day</td>
											<td align="right"><strong>29.57%</strong></td>
										  </tr>
										  
									  </tbody></table>
								  </div>
								  </div>
							</div>58.87</td>
											<td class="uc" style="display: none;" width="90" align="right">7.73</td>
											<td class="lc" style="display: none;" width="90" align="right">5.15</td>
											<td class="vwap" style="display: none;" width="90" align="right">627.53</td>
					                  </tr>
                                  <tr>
                    <td class="PR" width="155"><span class="gld13 disin"><a href="https://www.moneycontrol.com/india/stockpricequote/transportlogistics/taalenterprises/TAA01" style="color:#333">TAAL Enterprise</a>
					                      <div class="disin PR tipshow"><span class="ic_plusor"></span>
                        <div class="stagetool wd190 dnone">
                          <div class="PA5 CTR op_gl12">Add to</div>
                          <div class="add_bubg CTR"> <a href="javascript:;" onclick="javascript:chkbx_val('TAA01','1');" class="bl13 watch"><span class="ic_watchlist"></span>Watchlist</a> | <a href="javascript:;" onclick="javascript:chkbx_val('TAA01','5');" class="bl13 port"><span class="ic_portfolio"></span>Portfolio</a> </div>
                          <span class="arrwodn"></span> </div>
                      </div>
					                        </span> <span class="ic_tradenwicn btn_tradep_pop ML2" onclick="tradepopup('INE524T01011','BSE','539956');"></span><div class="MT5"><div class="disin PR tolhov">
						<span class="ic_graphsp ML5"></span>
							<div class="stagetool tooltip1 PB5 dnone">
								<h1>ACTIONS</h1>
								<div class="PA10">
								  <ul><li><span>TAAL Enterprise has hit 52wk low of Rs 145.10 on BSE</span></li></ul>
						</div>
							<span style="left:2px;" class="arrwodn"></span> 
						</div>
					</div></div></td>
                    <td width="75" align="right">140.00</td>
                    <td width="75" align="right">140.00</td>
                    <td width="80" align="right">130.00</td>
                    <td width="85" align="right">139.20</td>
                    <td width="80" align="right">150.90</td>
                    <td class="red" width="85" align="right">-11.70</td>
                    <td class="red" width="75" align="right">-7.75</td>
											<td class="vol" style="display: none;" width="90" align="right"><div class="stp_info">
								  <span class="FR icon_info"></span>
									<div class="tooltist_cnt">
									  <div class="tooltip2">
										<div class="title2 MB5 TAC">AVERAGE VOLUME</div>
										<table width="100%" cellspacing="0" cellpadding="0" border="0">
											  <tbody><tr>
												<td>5-Day</td>
												<td align="right"><strong>1484.00</strong></td>
											  </tr>
											  <tr>
												<td>10-Day</td>
												<td align="right"><strong>1577.40</strong></td>
											  </tr>
											  <tr>
												<td>30-Day</td>
												<td align="right"><strong>2017.33</strong></td>
											  </tr>
										  </tbody></table>
									  </div>
									 </div>
								</div>1247</td>
											<td class="30d" style="display: none;" width="90" align="right">193.44</td>
											<td class="50d" style="display: none;" width="90" align="right">191.63</td>
											<td class="150d" style="display: none;" width="90" align="right">205.98</td>
											<td class="200d" style="display: none;" width="90" align="right">221.23</td>
											<td class="pe" style="display: none;" width="90" align="right">9.96</td>
											<td class="pb" style="display: none;" width="90" align="right">2.52</td>
											<td class="performance" width="90" align="right"><div class="changea red"> <span style="left:-3px; bottom:5px;" class="arrwodn"></span>
						<div class="dypop">
						  <p class="op12_gy">02-Mar-20</p>
						  <p class="blktxt13"><strong>160.00</strong> <span class="redarw15"></span> <span class="">-0.3 (-0.19%)</span></p>
						</div>
					</div><div class="changea red"> <span style="left:-3px; bottom:5px;" class="arrwodn"></span>
						<div class="dypop">
						  <p class="op12_gy">03-Mar-20</p>
						  <p class="blktxt13"><strong>157.05</strong> <span class="redarw15"></span> <span class="">-2.95 (-1.84%)</span></p>
						</div>
					</div><div class="changea red"> <span style="left:-3px; bottom:5px;" class="arrwodn"></span>
						<div class="dypop">
						  <p class="op12_gy">04-Mar-20</p>
						  <p class="blktxt13"><strong>142.40</strong> <span class="redarw15"></span> <span class="">-14.65 (-9.33%)</span></p>
						</div>
					</div><div class="changea "> <span style="left:-3px; bottom:5px;" class="arrwodn"></span>
						<div class="dypop">
						  <p class="op12_gy">05-Mar-20</p>
						  <p class="blktxt13"><strong>150.90</strong> <span class="grnarw14"></span> <span class="">8.5 (5.97%)</span></p>
						</div>
					</div><div class="changea red"> <span style="left:-3px; bottom:5px;" class="arrwodn"></span>
						<div class="dypop">
						  <p class="op12_gy">06-Mar-20</p>
						  <p class="blktxt13"><strong>139.20</strong> <span class="redarw15"></span> <span class="">-11.7 (-7.75%)</span></p>
						</div>
					</div></td>
											<td class="del" style="display: none;" width="90" align="right"><div class="stp_info">
							  <span class="FR icon_info"></span>
								<div class="tooltist_cnt">
								  <div class="tooltip2">
									<div class="title2 MB5 TAC">DELIVERY AVERAGES</div>
									<table width="100%" cellspacing="0" cellpadding="0" border="0">
										 
										  <tbody><tr>
											<td>3-Day</td>
											<td align="right"><strong>74.66%</strong></td>
										  </tr>
										  
										  <tr>
											<td>5-Day</td>
											<td align="right"><strong>71.20%</strong></td>
										  </tr>
										  
										  <tr>
											<td>8-Day</td>
											<td align="right"><strong>74.17%</strong></td>
										  </tr>
										  
									  </tbody></table>
								  </div>
								  </div>
							</div>82.88</td>
											<td class="uc" style="display: none;" width="90" align="right">1.81</td>
											<td class="lc" style="display: none;" width="90" align="right">1.21</td>
											<td class="vwap" style="display: none;" width="90" align="right">135.34</td>
					                  </tr>
                                  <tr>
                    <td class="PR" width="155"><span class="gld13 disin"><a href="https://www.moneycontrol.com/india/stockpricequote/miscellaneous/tanejaaerospaceaviation/TAA" style="color:#333">Taneja Aerospac</a>
					                      <div class="disin PR tipshow"><span class="ic_plusor"></span>
                        <div class="stagetool wd190 dnone">
                          <div class="PA5 CTR op_gl12">Add to</div>
                          <div class="add_bubg CTR"> <a href="javascript:;" onclick="javascript:chkbx_val('TAA','1');" class="bl13 watch"><span class="ic_watchlist"></span>Watchlist</a> | <a href="javascript:;" onclick="javascript:chkbx_val('TAA','5');" class="bl13 port"><span class="ic_portfolio"></span>Portfolio</a> </div>
                          <span class="arrwodn"></span> </div>
                      </div>
					                        </span> <span class="ic_tradenwicn btn_tradep_pop ML2" onclick="tradepopup('INE692C01020','BSE','522229');"></span><div class="MT5"><div class="disin PR tolhov">
						<span class="ic_graphsp ML5"></span>
							<div class="stagetool tooltip1 PB5 dnone">
								<h1>ACTIONS</h1>
								<div class="PA10">
								  <ul><li><span>Only Buyers in Taneja Aerospac on BSE</span></li></ul>
						</div>
							<span style="left:2px;" class="arrwodn"></span> 
						</div>
					</div></div></td>
                    <td width="75" align="right">22.10</td>
                    <td width="75" align="right">22.60</td>
                    <td width="80" align="right">20.35</td>
                    <td width="85" align="right">21.95</td>
                    <td width="80" align="right">23.10</td>
                    <td class="red" width="85" align="right">-1.15</td>
                    <td class="red" width="75" align="right">-4.98</td>
											<td class="vol" style="display: none;" width="90" align="right"><div class="stp_info">
								  <span class="FR icon_info"></span>
									<div class="tooltist_cnt">
									  <div class="tooltip2">
										<div class="title2 MB5 TAC">AVERAGE VOLUME</div>
										<table width="100%" cellspacing="0" cellpadding="0" border="0">
											  <tbody><tr>
												<td>5-Day</td>
												<td align="right"><strong>9890.00</strong></td>
											  </tr>
											  <tr>
												<td>10-Day</td>
												<td align="right"><strong>10049.80</strong></td>
											  </tr>
											  <tr>
												<td>30-Day</td>
												<td align="right"><strong>24562.07</strong></td>
											  </tr>
										  </tbody></table>
									  </div>
									 </div>
								</div>18040</td>
											<td class="30d" style="display: none;" width="90" align="right">26.99</td>
											<td class="50d" style="display: none;" width="90" align="right">26.55</td>
											<td class="150d" style="display: none;" width="90" align="right">23.85</td>
											<td class="200d" style="display: none;" width="90" align="right">24.70</td>
											<td class="pe" style="display: none;" width="90" align="right">8.01</td>
											<td class="pb" style="display: none;" width="90" align="right">0.59</td>
											<td class="performance" width="90" align="right"><div class="changea "> <span style="left:-3px; bottom:5px;" class="arrwodn"></span>
						<div class="dypop">
						  <p class="op12_gy">02-Mar-20</p>
						  <p class="blktxt13"><strong>24.90</strong> <span class="grnarw14"></span> <span class="">0.55 (2.26%)</span></p>
						</div>
					</div><div class="changea red"> <span style="left:-3px; bottom:5px;" class="arrwodn"></span>
						<div class="dypop">
						  <p class="op12_gy">03-Mar-20</p>
						  <p class="blktxt13"><strong>24.40</strong> <span class="redarw15"></span> <span class="">-0.5 (-2.01%)</span></p>
						</div>
					</div><div class="changea red"> <span style="left:-3px; bottom:5px;" class="arrwodn"></span>
						<div class="dypop">
						  <p class="op12_gy">04-Mar-20</p>
						  <p class="blktxt13"><strong>23.55</strong> <span class="redarw15"></span> <span class="">-0.85 (-3.48%)</span></p>
						</div>
					</div><div class="changea red"> <span style="left:-3px; bottom:5px;" class="arrwodn"></span>
						<div class="dypop">
						  <p class="op12_gy">05-Mar-20</p>
						  <p class="blktxt13"><strong>23.10</strong> <span class="redarw15"></span> <span class="">-0.45 (-1.91%)</span></p>
						</div>
					</div><div class="changea red"> <span style="left:-3px; bottom:5px;" class="arrwodn"></span>
						<div class="dypop">
						  <p class="op12_gy">06-Mar-20</p>
						  <p class="blktxt13"><strong>21.95</strong> <span class="redarw15"></span> <span class="">-1.15 (-4.98%)</span></p>
						</div>
					</div></td>
											<td class="del" style="display: none;" width="90" align="right"><div class="stp_info">
							  <span class="FR icon_info"></span>
								<div class="tooltist_cnt">
								  <div class="tooltip2">
									<div class="title2 MB5 TAC">DELIVERY AVERAGES</div>
									<table width="100%" cellspacing="0" cellpadding="0" border="0">
										 
										  <tbody><tr>
											<td>3-Day</td>
											<td align="right"><strong>84.25%</strong></td>
										  </tr>
										  
										  <tr>
											<td>5-Day</td>
											<td align="right"><strong>83.57%</strong></td>
										  </tr>
										  
										  <tr>
											<td>8-Day</td>
											<td align="right"><strong>81.94%</strong></td>
										  </tr>
										  
									  </tbody></table>
								  </div>
								  </div>
							</div>71.70</td>
											<td class="uc" style="display: none;" width="90" align="right">0.28</td>
											<td class="lc" style="display: none;" width="90" align="right">0.19</td>
											<td class="vwap" style="display: none;" width="90" align="right">21.76</td>
					                  </tr>
                				                </tbody>
              </table>
            </div>
python pandas dataframe beautifulsoup css-selectors
1个回答
0
投票

基本上,页面是通过JavaScript加载的,因此一旦页面加载,就无法使用requests模块动态解析JSrendered

您可以将selenium用于此类任务。否则,您可以使用HTMLSession模块中的requests_html进行操作,从而实时显示JavaScript

from selenium import webdriver
from selenium.webdriver.firefox.options import Options
import pandas as pd

options = Options()
options.add_argument('--headless')
driver = webdriver.Firefox(options=options)

driver.get("https://www.moneycontrol.com/stocks/marketstats/industry-classification/bse/aerospace-defence.html")


df = pd.read_html(driver.page_source)[0]

print(df)
df.to_csv("result.csv", index=False)

driver.quit()

输出:VIEW-ONLINE

                                        Company Name  ...  5 Day Performance -18.45 (-2.7%)  03-Mar-20  6...  5 Day Performance  Volume  Lower Circuit  Upper Circuit  VWAP  SMA Deliver -0.3 (-0.19%)  03-Mar-20  15...ables  P/E  P/B                                                             0.55 (2.26%)  03-Mar-20  24.4...
0  Hindustan Aeron  Add to  Watchlist | Portfolio...  ...  02-Mar-20  666.10 -18.45 (-2.7%)  03-Mar-20  6...
1  TAAL Enterprise  Add to  Watchlist | Portfolio...  ...  02-Mar-20  160.00 -0.3 (-0.19%)  03-Mar-20  15...
2  Taneja Aerospac  Add to  Watchlist | Portfolio...  ...  02-Mar-20  24.90 
0.55 (2.26%)  03-Mar-20  24.4...

[3 rows x 9 columns]

enter image description here

© www.soinside.com 2019 - 2024. All rights reserved.