美丽的汤Parse Python

问题描述 投票:1回答:2

我使用BS4捕获了以下html,但似乎无法搜索艺术家标签。我已经将这段代码分配给一个名为container的变量,然后尝试了

print container.tr.td["artist"]

没有运气。有什么建议表赞赏

<tr class="item">
  <!-- <td class="image"><a href="https://www.stargreen.com/kool-as-the-gang-44415.html" title="KOOL AS THE GANG " class="product-image"><img src="https://www.stargreen.com/media/catalog/product/cache/1/small_image/135x/9df78eab33525d08d6e5fb8d27136e95/K/o/KoolAsTheGang.jpg" width="135" height="135" alt="KOOL AS THE GANG " /></a></td> -->
  <td class="date">Sat, 30 Dec 2017</td>
  <td class="artist">kool as the gang</td>
  <td class="venue">100 club</td>
  <td class="link">
  <p class="availability out-of-stock">
    <span>Off Sale</span></p>
  </td>
</tr>
python html web-scraping beautifulsoup
2个回答
5
投票

你的语法错了,“artist”是“class”属性的值,试试这个:

from bs4 import BeautifulSoup

html = """
<tr class="item">
<!-- <td class="image"><a href="https://www.stargreen.com/kool-as-the-gang-44415.html" title="KOOL AS THE GANG " class="product-image"><img src="https://www.stargreen.com/media/catalog/product/cache/1/small_image/135x/9df78eab33525d08d6e5fb8d27136e95/K/o/KoolAsTheGang.jpg" width="135" height="135" alt="KOOL AS THE GANG " /></a></td> -->
<td class="date">Sat, 30 Dec 2017</td>
<td class="artist">
                        kool as the gang                     </td>
<td class="venue">100 club</td>
<td class="link">
<p class="availability out-of-stock">
<span>Off Sale</span></p>
</td>
</tr>
"""

soup = BeautifulSoup(html, 'html.parser')
td = soup.find('td',{'class': 'artist'})
print (td.text.strip())

输出:

kool as the gang

2
投票

其他方式。

container中寻找classselect方法的'艺术家'的元素。由于可能有多个,但您知道只有一个,请选择列表中唯一的元素,并请求其text属性。

>>> HTML = open('sven.htm').read()
>>> import bs4
>>> container = bs4.BeautifulSoup(HTML, 'lxml')
>>> container.select('.artist')[0].text
'\n                        kool as the gang                     '
© www.soinside.com 2019 - 2024. All rights reserved.