我使用BS4
捕获了以下html,但似乎无法搜索艺术家标签。我已经将这段代码分配给一个名为container的变量,然后尝试了
print container.tr.td["artist"]
没有运气。有什么建议表赞赏
<tr class="item">
<!-- <td class="image"><a href="https://www.stargreen.com/kool-as-the-gang-44415.html" title="KOOL AS THE GANG " class="product-image"><img src="https://www.stargreen.com/media/catalog/product/cache/1/small_image/135x/9df78eab33525d08d6e5fb8d27136e95/K/o/KoolAsTheGang.jpg" width="135" height="135" alt="KOOL AS THE GANG " /></a></td> -->
<td class="date">Sat, 30 Dec 2017</td>
<td class="artist">kool as the gang</td>
<td class="venue">100 club</td>
<td class="link">
<p class="availability out-of-stock">
<span>Off Sale</span></p>
</td>
</tr>
你的语法错了,“artist”是“class”属性的值,试试这个:
from bs4 import BeautifulSoup
html = """
<tr class="item">
<!-- <td class="image"><a href="https://www.stargreen.com/kool-as-the-gang-44415.html" title="KOOL AS THE GANG " class="product-image"><img src="https://www.stargreen.com/media/catalog/product/cache/1/small_image/135x/9df78eab33525d08d6e5fb8d27136e95/K/o/KoolAsTheGang.jpg" width="135" height="135" alt="KOOL AS THE GANG " /></a></td> -->
<td class="date">Sat, 30 Dec 2017</td>
<td class="artist">
kool as the gang </td>
<td class="venue">100 club</td>
<td class="link">
<p class="availability out-of-stock">
<span>Off Sale</span></p>
</td>
</tr>
"""
soup = BeautifulSoup(html, 'html.parser')
td = soup.find('td',{'class': 'artist'})
print (td.text.strip())
输出:
kool as the gang
其他方式。
在container
中寻找class
是select
方法的'艺术家'的元素。由于可能有多个,但您知道只有一个,请选择列表中唯一的元素,并请求其text
属性。
>>> HTML = open('sven.htm').read()
>>> import bs4
>>> container = bs4.BeautifulSoup(HTML, 'lxml')
>>> container.select('.artist')[0].text
'\n kool as the gang '