使用Python，如何从Google搜索中删除链接的描述性文本？

Question

在python3中，我有这个脚本来抓取Google搜索的第一个屏幕：

from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.common.exceptions import NoAlertPresentException
from selenium.webdriver.support.select import Select

nome = '"ALDEANNO CAMPOS"'
nome = nome.replace(' ', '+')
cargo = 'DEPUTADO FEDERAL'

busca = f'https://www.google.com.br/search?q={nome}+{cargo}+ditadura'

profile = webdriver.FirefoxProfile()
browser = webdriver.Firefox(profile)

browser.get(busca)

html = browser.page_source
soup = BeautifulSoup(html, "html.parser")
browser.close()

page = soup.find_all("div", {"class": "rc"})

for link in page:
    href = link.find("a")['href']
    texto = link.find("a").text
    print(href)
    print(texto)
    print("---------------")

该程序显示或捕获href链接和链接的描述性文本，即页面的名称。但我还想提取Google搜索链接下方的短语

例如，在此页面（https://www.google.com/search?client=ubuntu&channel=fs&ei=DrSNW8r3E4urwgS977WYDA&q=ALDEANNO+CAMPOS+deputado+federal+ditadura&oq=ALDEANNO+CAMPOS+deputado+federal+ditadura&gs_l=psy-ab.12...0.0.0.1933260.0.0.0.0.0.0.0.0..0.0....0...1c..64.psy-ab..0.0.0....0.U9iFnwXwzpk）上的文本：

“2018年8月24日 - 联邦副市长Aldeanno Campos在2018年帕拉选举中竞争PRP的候选人的全貌。”

“我们指的是被剥夺自由的下列巴西参议员和联邦代表...... ....EpílogodeCampos·Costa Rego·累西腓，PE，PTB-PE（1962）......”

“FranciscoLuísdaSilva Campos（Indaiá画家，1891年11月18日 - Belo Horizonte，...... 1921年，Francisco Campos当选为PRM联邦代表，在......武装部队首次宣布将导致Estado Novo独裁统治的准备工作，由1937年11月颁布的政变法令安装。“

等等

请问，有谁知道如何捕捉链接下方的最终文本？

如何显示名称“CORONEL FERES” - 打印（链接） - （无法显示HTML代码）

PSL Itapema - Posts | Facebookhttps://www.facebook.com/PSLitapema17/posts/1638801189535968General Mourão apoia o pré-cadidato a Deputado Federal Coronel Feres. Confira: 37 Views .... Há uma ditadura silenciosa que não podemos permitir. Bom dia!

Answer 1

您只需要在循环中添加它，请参阅下面的代码。

from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.common.exceptions import NoAlertPresentException
from selenium.webdriver.support.select import Select

nome = '"ALDEANNO CAMPOS"'
nome = nome.replace(' ', '+')
cargo = 'DEPUTADO FEDERAL'

busca = f'https://www.google.com.br/search?q={nome}+{cargo}+ditadura'

profile = webdriver.FirefoxProfile()
browser = webdriver.Firefox(profile)

browser.get(busca)

html = browser.page_source
soup = BeautifulSoup(html, "html.parser")
browser.close()

page = soup.find_all("div", {"class": "rc"})

for link in page:
    href = link.find("a")['href']
    texto = link.find("a").text
    body = link.find('span', attrs={'class': 'st'}).text
    print(href)
    print(texto)
    print(body)
    print("---------------")

使用Python，如何从Google搜索中删除链接的描述性文本？

问题描述投票：0回答：1

1个回答

最新问题

使用Python，如何从Google搜索中删除链接的描述性文本？

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1