使用 BeautifulSoup 获取 <a> 标签内容

Question

我想在Python中使用BeautifulSoup（版本4.12.3）获取

<a>

标签的内容。我有这个代码和 HTML 示例：

h = """
<a id="0">
    <table> 
  <thead>
    <tr>
      <th scope="col">Person</th>
      <th scope="col">Most interest in</th>
      <th scope="col">Age</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th scope="row">Chris</th>
      <td>HTML tables</td>
      <td>22</td>
    </tr>
    </table>
</a>
"""

test = bs4.BeautifulSoup(h)
test.find('a')  # find_all, select => same results

但它只返回：

<a id="0">
</a>

我希望

<table>

内的内容出现在

<a>

标签之间。（我不知道将表格包装在

<a>

标签内是否常见，但我尝试阅读的 HTML 代码是这样的）

我需要从

<a>

标签解析表格内容，因为我需要将

id="0"

链接到表格的内容。

我怎样才能做到这一点？如何使用

<a>

标签获取

<table>

标签内容？

Answer 1

明确指定您要使用的解析器（使用

html.parser

）。默认情况下，它将使用可用的“最佳”解析器 - 我按下

lxml

，它不能很好地解析此文档：

import bs4

h = """
<a id="0">
    <table> 
  <thead>
    <tr>
      <th scope="col">Person</th>
      <th scope="col">Most interest in</th>
      <th scope="col">Age</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th scope="row">Chris</th>
      <td>HTML tables</td>
      <td>22</td>
    </tr>
    </table>
</a>
"""

test = bs4.BeautifulSoup(h, "html.parser")  # <-- define parser here
out = test.find("a")

print(out)

打印：

<a id="0">
<table>
<thead>
<tr>
<th scope="col">Person</th>
<th scope="col">Most interest in</th>
<th scope="col">Age</th>
</tr>
</thead>
<tbody>
<tr>
<th scope="row">Chris</th>
<td>HTML tables</td>
<td>22</td>
</tr>
</tbody></table>
</a>

使用 BeautifulSoup 获取 <a> 标签内容

问题描述投票：0回答：1

1个回答

最新问题

使用 BeautifulSoup 获取 <a> 标签内容

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1