当重复班级名称时(beautifutsoup)

问题描述 投票:0回答:1
<div class="glance_ctn_responsive_left"> <div id="userReviews" class="user_reviews"> <div class="user_reviews_summary_row" onclick="window.location='#app_reviews_hash'" style="cursor: pointer;" data-tooltip-html="No user reviews" itemprop="aggregateRating" itemscope="" itemtype="http://schema.org/AggregateRating"> <div class="subtitle column all">All Reviews:</div> <div class="summary column">No user reviews</div> </div> </div> <div class="release_date"> <div class="subtitle column">Release Date:</div> <div class="date">2025</div> </div> <div class="dev_row"> <div class="subtitle column">Developer:</div> <div class="summary column" id="developers_list"> <a href="https://store.steampowered.com/curator/45188208?snr=1_5_9__2000">FromSoftware, Inc.</a> </div> </div> <div class="dev_row"> <div class="subtitle column">Publisher:</div> <div class="summary column"> <a href="https://store.steampowered.com/curator/45188208?snr=1_5_9__2000">FromSoftware, Inc.</a>, <a href="https://store.steampowered.com/curator/45188208?snr=1_5_9__2000">Bandai Namco Entertainment</a> </div> <div class="more_btn">+</div></div> </div>

我正在运行此脚本 from bs4 import BeautifulSoup publisher = soup.find('div', class_='dev_row') publisher_name = publisher.text.strip() if publisher else "N/A" print(publisher_name)

我遇到的问题是,我无法使用通常使用的内容来识别字符串:

类“ dev_row”在汤中重复两次,所以我无法使用
在汤中重复两次标签“ a”
我无法使用链接,因为我在多个页面上运行此脚本,并且链接每次都会更改

注意我对此很新,所以我可能会缺少一些非常明显的东西

您可以将CSS选择器与Beautifulsoup一起使用。那将是最简单的方法。
    我在div中选择了div class
  • dev_row
  • 中的div,并且没有任何wittibute。另一个具有
  • id
  • ,因此没有ID,它将是发布者的div.
  • id

或您可以使用标签

publisher = soup.select_one("div.dev_row > div.summary:not([id])") publisher_name = publisher.text.strip() if publisher else "N/A" print(publisher_name)
python web-scraping beautifulsoup steam
1个回答
0
投票
Publisher:

最新问题
© www.soinside.com 2019 - 2025. All rights reserved.