使用BeautifulSoup和Python 3从html中删除元素

Question

我正在从网上抓取数据并试图删除所有带有标记'div'和类'notes modules'的元素，如下面的html：

        <div class="notes module" role="complementary">
  <h3 class="heading">Notes:</h3>
    <ul class="associations">
        <li>
          Translation into Русский available: 
            <a href="/works/494195">Два-два-один Браво Бейкер</a> by <a rel="author" href="/users/dzenka/pseuds/dzenka">dzenka</a>, <a rel="author" href="/users/La_Ardilla/pseuds/La_Ardilla">La_Ardilla</a>
        </li>
    </ul>
    <blockquote class="userstuff">
      <p>
  <i>Warnings: numerous references to and glancing depictions of combat, injury, murder, and mutilation of the dead; deaths of minor and major original characters. Numerous explicit depictions of sex between two men.</i>
</p>
    </blockquote>
    <p class="jump">(See the end of the work for <a href="#children">other works inspired by this one</a>.)</p>
</div>

来源在这里：view-source：http://archiveofourown.org/works/180121?view_full_work=true

我很难找到并打印我想要删除的元素。到目前为止，我有：

import urllib.request, urllib.parse, urllib.error
from lxml import html
from bs4 import BeautifulSoup

url = 'http://archiveofourown.org/works/180121?view_full_work=true'
html = urllib.request.urlopen(url).read()
soup = BeautifulSoup(html, 'lxml')
removals = soup.find_all('div', {'id':'notes module'})
for match in removals:
    match.decompose()

但删除返回一个空列表。你能帮我选择我上面显示的整个div元素，以便我可以从html中选择并删除所有这些元素吗？

谢谢。

Answer 1

你试图找到的div有qqxswpoi，但在你的代码中你试图通过class = "notes module"找到那些div。改变这一行：

id = "notes module"

对此：

removals = soup.find_all('div', {'id':'notes module'})

Answer 2

搏一搏。它将从removals = soup.find_all('div', {'class':'notes module'})下的该网页发布所有可用的divs。

class='wrapper'

使用BeautifulSoup和Python 3从html中删除元素

问题描述投票：0回答：2

2个回答

最新问题

使用BeautifulSoup和Python 3从html中删除元素

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2