使用Beautiful Soup使用Python提取HTML内容

问题描述 投票:0回答:2

嗨,我正在使用美丽的汤库来解析HTML页面中的内容。

我使用以下脚本进入我想要的页面部分:

review_list = soup.find(class_="review_list_score_breakdown_right")

<span class=" review_list_score_breakdown_right">
 <ul class="review_score_breakdown_list list_tighten clearfix" data-et-view="bLTQHcXJVNRCSPOMcAQJO:1 bLTQHcXJVNRCSPOMcAQJO:3 " id="review_list_score_breakdown">
  <li class="clearfix one_col" data-question="hotel_clean">
   <p class="review_score_name">
    Cleanliness
   </p>
   <div class="score_bar">
    <div class="score_bar_value" data-score="100" style="width: 100%;">
    </div>
   </div>
   <p class="review_score_value">
    10
   </p>
  </li>
  <li class="clearfix one_col" data-question="hotel_comfort">
   <p class="review_score_name">
    Comfort
   </p>
   <div class="score_bar">
    <div class="score_bar_value" data-score="100" style="width: 100%;">
    </div>
   </div>
   <p class="review_score_value">
    10
   </p>
  </li>
  <li class="clearfix one_col" data-question="hotel_services">
   <p class="review_score_name">
    Facilities
   </p>
   <div class="score_bar">
    <div class="score_bar_value" data-score="100" style="width: 100%;">
    </div>
   </div>
   <p class="review_score_value">
    10
   </p>
  </li>
  <li class="clearfix one_col" data-question="hotel_staff">
   <p class="review_score_name">
    Staff
   </p>
   <div class="score_bar">
    <div class="score_bar_value" data-score="100" style="width: 100%;">
    </div>
   </div>
   <p class="review_score_value">
    10
   </p>
  </li>
  <li class="clearfix one_col" data-question="hotel_value">
   <p class="review_score_name">
    Value for money
   </p>
   <div class="score_bar">
    <div class="score_bar_value" data-score="100" style="width: 100%;">
    </div>
   </div>
   <p class="review_score_value">
    10
   </p>
  </li>
  <li class="clearfix one_col" data-question="hotel_wifi">
   <p class="review_score_name">
    Free WiFi
   </p>
   <div class="score_bar">
    <div class="score_bar_value" data-score="100" style="width: 100%;">
    </div>
   </div>
   <p class="review_score_value">
    10
   </p>
  </li>
  <li class="clearfix one_col" data-question="hotel_location">
   <p class="review_score_name">
    Location
   </p>
   <div class="score_bar">
    <div class="score_bar_value" data-score="100" style="width: 100%;">
    </div>
   </div>
   <p class="review_score_value">
    10
   </p>
  </li>
 </ul>
</span>

我需要从数据问题标签中提取分数。例如,如果我想知道酒店的舒适度得分,我需要访问data-question= "hotel_confort"我尝试过功能find()但它不起作用。

python html web-scraping beautifulsoup
2个回答
0
投票

您的密码中没有hotel_confort attrs。

    review = soup.find(class_="review_list_score_breakdown_right")
    hotel = review.find(attrs={"data-question" : "hotel_comfort"})

此代码返回

<li class="clearfix one_col" data-question="hotel_comfort"> ..... </li>


0
投票

我认为你需要的是attrs查找查询。你的问题类似于Extracting an attribute value with beautifulsoup

我会根据你的情况做一点具体的事情。

review = soup.find(class_="review_list_score_breakdown_right")
input = review.find(attrs={"data-question" : "hotel-comfort"})
output = input['value']

我用了bs4已经有一段时间了所以请调试代码。

编辑:这是从您的示例字符串中获取的一些工作代码

review = soup.find('span', {'class' : "review_list_score_breakdown_right"})
input = review.find_all(attrs={"data-question": "hotel_comfort"})
print(input) #print the html extract which you can go down further.
© www.soinside.com 2019 - 2024. All rights reserved.