更换 在BeautifulSoup输出中有空格

问题描述 投票:1回答:3

我正在使用BeautifulSoup抓一些链接但是,它似乎完全忽略了qazxsw poi标签。

以下是我正在废弃的网址源代码的相关部分:

<br>

这是我的BeautifulSoap代码(仅限相关部分)来获取<h1 class="para-title">A quick brown fox jumps over<br>the lazy dog <span id="something">&#xe800;</span></h1> 标签内的文本:

h1

这给出了以下输出:

    soup = BeautifulSoup(page, 'html.parser')
    title_box = soup.find('h1', attrs={'class': 'para-title'})
    title = title_box.text.strip()
    print title

虽然我期待:

    A quick brown fox jumps overthe lazy dog

如何在代码中用 A quick brown fox jumps over the lazy dog 替换<br>

python web-scraping beautifulsoup
3个回答
3
投票

如何将space与separator参数一起使用?

.get_text()

输出:

from bs4 import BeautifulSoup

page = '''<h1 class="para-title">A quick brown fox jumps over<br>the lazy dog
<span>some stuff here</span></h1>'''


soup = BeautifulSoup(page, 'html.parser')
title_box = soup.find('h1', attrs={'class': 'para-title'})
title = title_box.get_text(separator=" ").strip()
print (title)   

2
投票

在解析之前在html上使用print (title) A quick brown fox jumps over the lazy dog some stuff here

replace()

OUTPUT:

from bs4 import BeautifulSoup

html = '''<h1 class="para-title">A quick brown fox jumps over<br>the lazy dog
<span>some stuff here</span></h1>'''

html = html.replace("<br>", " ")
soup = BeautifulSoup(html, 'html.parser')
title_box = soup.find('h1', attrs={'class': 'para-title'})
title = title_box.get_text().strip()
print (title)

编辑:

对于以下评论中提到的OP部分;

A quick brown fox jumps over the lazy dog
some stuff here

OUTPUT:

html = '''<div class="description">Planet Nine was initially proposed to explain the clustering of orbits
Of Planet Nine's other effects, one was unexpected, the perpendicular orbits, and the other two were found after further analysis. Although other mechanisms have been offered for many of these peculiarities, the gravitational influence of Planet Nine is the only one that explains all four.
</div>'''

from bs4 import BeautifulSoup

html = html.replace("\n", ". ")
soup = BeautifulSoup(html, 'html.parser')
div_box = soup.find('div', attrs={'class': 'description'})
divText= div_box.get_text().strip()
print (divText)

0
投票

使用Planet Nine was initially proposed to explain the clustering of orbits. Of Planet Nine's other effects, one was unexpected, the perpendicular orbits, and the other two were found after further analysis. Although other mechanisms have been offered for many of these peculiarities, the gravitational influence of Planet Nine is the only one that explains all four.. 函数: str.replace

© www.soinside.com 2019 - 2024. All rights reserved.