我正在构建一个网页爬虫,遇到一个带有以下未显示标记的页面。
<div style="display:none; padding:3px 10px 5px;text-align:center;" id="dialogCookieInfo" title="taiwan high-speed rail" wicket:message="title=bookingdialog_3">
<div class="JCon">
<div class="TCon">
<div class="overDiffText">
<div style="text-align: left;">
<span> for better user experiences, bla bla <a target="_blank" class="c" style="color:#FF9900;" href="https://www.thsrc.com.tw/tw/Article/ArticleContent/d1fa3bcb-a016-47e2-88c6-7b7cbed00ed5?tabIndex=1">privacy protection</a>。</span>
</div>
</div>
<div class="action">
<table border="0" cellpadding="0" cellspacing="0" align="center">
<tr>
<td>
<input hidefocus="" name="confirm" id="btn-confirm" type="button" class="button_main" value="我同意"/>
</td>
</tr>
</table>
</div>
</div>
</div>
</div>
像往常一样,此标记将显示在渲染中,实际上显示在所有其他标记的前面。它实际上遮蔽了所有其他标签以确认或取消答案。问题是beautifulsoup没有正确地在我的程序查询中返回此标记。 Beautifulsoup只是说这个标签的样式是“display:none”,并没有透露标签的其他属性及其子代。但我需要这个标签来检查它是否是影响所有其他标签的标签。 谁能帮我回答以下问题?
真的很感谢所有的回复。
不确定这是否真的是你想要的,但希望它至少让你朝着正确的方向前进。但是,您可以遍历<div>
标记并检查它是否具有“样式”属性。如果它具有“样式”属性,则可以检查是否存在“display:none”。当这些都是真的时,您可以做任何你需要做的事情。
html = '''<div style="display:none; padding:3px 10px 5px;text-align:center;" id="dialogCookieInfo" title="taiwan high-speed rail" wicket:message="title=bookingdialog_3">
<div class="JCon">
<div class="TCon">
<div class="overDiffText">
<div style="text-align: left;">
<span> for better user experiences, bla bla <a target="_blank" class="c" style="color:#FF9900;" href="https://www.thsrc.com.tw/tw/Article/ArticleContent/d1fa3bcb-a016-47e2-88c6-7b7cbed00ed5?tabIndex=1">privacy protection</a>。</span>
</div>
</div>
<div class="action">
<table border="0" cellpadding="0" cellspacing="0" align="center">
<tr>
<td>
<input hidefocus="" name="confirm" id="btn-confirm" type="button" class="button_main" value="我同意"/>
</td>
</tr>
</table>
</div>
</div>
</div>
</div>'''
import bs4
soup = bs4.BeautifulSoup(html, 'html.parser')
div_display = soup.find_all('div')
for ele in div_display:
try:
ele['style']
if 'display:none' in ele['style']:
print ('Found "diplay:none"')
# Do some stuff with this element
else:
print ('Did not find "diplay:none"')
except:
print ('Element did not have "style" attribute')
输出:
Found "diplay:none"
Element did not have "style" attribute
Element did not have "style" attribute
Element did not have "style" attribute
Did not find "diplay:none"
Element did not have "style" attribute