仅从 beautiful soup 中的 url 中过滤 id 号

Question

我是Python新手，但正在尝试构建一个网络抓取工具。

我已经到了这样的地步

print(soup.td.a)

结果

<a href="/?p=section&amp;a=details&amp;id=37627">Some Text Here</a>

我正在尝试找出如何进一步过滤，以便所有结果都是

我尝试了很多方法，包括 urlparse 和 re.compile，但我只是没有得到正确的语法。另外，我觉得可能有一种我没有找到的更简单的方法。我感谢所提供的任何帮助。谢谢

Answer 1

您可以使用 parse_qs() 方法来解析查询：


from bs4 import BeautifulSoup
from urllib.parse import urlparse, parse_qs

html_content = '''
<td>
    <a href="/?p=section&amp;a=details&amp;id=37627">Some Text Here</a>
</td>
'''

# Parse the HTML content
soup = BeautifulSoup(html_content, 'html.parser')

# Find the <a> tag
a_tag = soup.find('a')

# Extract the href attribute
href = a_tag.get('href')

# Parse the URL to get the query parameters
parsed_url = urlparse(href)
# for py2: parsed_url = urlparse.urlparse(url)
query_params = parse_qs(parsed_url.query)

# Get the 'id' parameter
id_value = query_params.get('id', [None])[0]

print(id_value)  # Output: 37627

仅从 beautiful soup 中的 url 中过滤 id 号

问题描述投票：0回答：1

1个回答

最新问题

仅从 beautiful soup 中的 url 中过滤 id 号

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1