从python中的soup中提取特定的div内容?

问题描述 投票:0回答:1

我有以下网址要抓取 -

l = https://cfpub.epa.gov/compliance/criminal_prosecution/index.cfm?action=3&prosecution_summary_id=3420&searchParams=M5%2C%3A%2FXT%2A%5CCYZ%40O%3B%20W%5F%2AYN5%5E%3EK99%2A%29W%3CU%3FV%23DH%5BZ4%247TRPH%3BJQH%229%3FD%3C%26Z%40CY%26%0AM7EFH%21%25%21%3A%23%3DV%40%3A%2A%5F%3AB8%2A%5DR%3BB%25%5E9%5B2D%22I2KE65NEY7M%21%2DU%40%2B8%22J%29Y%23%24LNJ%40DX%24%0A%2F5YJ%3EP%27O%5FK04%5FG%5C%3E%290M4%2E%0A

我写了下面一段代码来从该页面获取内容 -

from bs4 import BeautifulSoup
from selenium.webdriver import ChromeOptions
from selenium import webdriver
options = ChromeOptions()
options.add_argument("--headless")
driver = webdriver.Chrome(options)
driver.get(l)
soup = BeautifulSoup(driver.page_source, 'html.parser')
cp_row_divs = soup.find_all('div', {'class'='cpRow'})

但它同时提供了 cpRow 和 cpRow 奇数标签数据。 我想要两个标签分开。 我想要以下变量中的数据 -

1)FISCAL YEAR = 2023
2)summary = October 7, 2022
Zachary Czubak was sentenced to serve 5 years of probation, complete 180 hours of community work service, and pay a $66,000 criminal fine.CITATION: 18 U.S.C. 371
3)full_text = Zachary Czubak, Patrick Fleming and their co-defendant tampered with federally mandated monitoring devices on private and commercial diesel vehicles and removed required air pollution control equipment on at least 37 vehicles between July 2019 and September 2020.In July 2019, the co-owners of Arm Rippin Toys, including Czubak and Fleming, entered into an agreement to engage in “tuning and deleting” customers’ diesel vehicles. This process involves the removal of emissions control systems which are designed to reduce pollutants being emitted from the vehicles. Under normal operating conditions, an on-board diagnostic (OBD) system will detect any removal and/or malfunction of a vehicle’s emissions control equipment. By modifying OBDs on vehicles, Arm Rippin Toy’s co-owners and employees falsified, tampered with and rendered inaccurate the vehicles’ monitoring devices so that the modified vehicle could continue to function despite the removal or deletion of emissions control equipment. In total, Arm Rippin Toys collected approximately $100,000 for performing unlawful deletes and tunes on diesel vehicles.
February 10, 2023
Patrick Fleming was sentenced to serve 5 years of probation and pay a $66,000 fine.
CITATION: 18 U.S.C. 371
STATUTE:Clean Air Act (CAA)
Title 18 U.S. Criminal Code (TITLE 18)

如有任何帮助,我们将不胜感激。

python selenium-webdriver
1个回答
0
投票

使用

soup.select()
方法,您可以使用 css 选择器来定位特定元素。

例如,

soup.find_all('div', {'class'='cpRow'})
可以重写为
soup.select('div.cpRow')

然后要过滤掉具有

odd
类的元素,您可以使用

soup.select('div.cpRow:not(.odd)')

有关 CSS 选择器的更多信息 https://www.w3schools.com/cssref/css_selectors.php

© www.soinside.com 2019 - 2024. All rights reserved.