我是 Python 的新手,我正在学习它是为了抓取目的我正在使用 BeautifulSoup 从工作机会中收集描述
在另一个提供工作机会并使用相同代码但具有不同 div 类的站点中,我可以找到我想要的东西。我为 justjoin.it 写了这段代码
import requests
from bs4 import BeautifulSoup
link="https://justjoin.it/offers/jungle-devops-engineer"
response_IDs=requests.get(link)
soup=BeautifulSoup(response_IDs.text, 'html.parser')
Search_part = soup.find(id='root')
description= Search_part.find_all('div', class_='css-gz8dae')
for i in description:
print(i)
帮我写一个正确的代码
首先,确保你已经安装了 Selenium:
pip install selenium
对于 google colab,请在
!
的前面添加一个
pip install
(见下文)。正如我提到的,我在使用 FireFox 的 google colab 上运行我所有的 python。这对我有用:
import requests
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.firefox.options import Options
link = "https://justjoin.it/offers/jungle-devops-engineer"
# Set up headless browser (no GUI)
options = Options()
options.headless = True
browser = webdriver.Firefox(options=options)
# Use Selenium to get the page source after JavaScript has executed
browser.get(link)
page_source = browser.page_source
browser.quit()
# Use BeautifulSoup to parse the HTML
soup = BeautifulSoup(page_source, 'html.parser')
description = soup.find_all('div', class_='css-gz8dae')
for i in description:
print(i.text)
这是输出:
Running a flexible Machine Learning engine at scale is hard.
We must ingest and process large volumes of data
uninterruptedly and store it in a scalable manner.
The data needs to be prepared and served to hundreds of
models constantly. All the predictions of the models, as well as other data pipelines, ...
如果您使用 chrome,请更改此行
browser = webdriver.Firefox(options=options)
这个:
browser = webdriver.Chrome(options=options)
要在 google colab 上运行整个程序,您需要先像这样安装 selenium 和 firefox:
!pip install selenium
!apt-get update
!apt install -y firefox
!apt install -y wget
!apt install -y unzip
然后,您还需要 GeckoDriver,它应该在系统的 PATH 中设置:
!wget https://github.com/mozilla/geckodriver/releases/download/v0.30.0/geckodriver-v0.30.0-linux64.tar.gz
!tar -xvf geckodriver-v0.30.0-linux64.tar.gz
!chmod +x geckodriver
!mv geckodriver /usr/local/bin/
在这些安装之后运行上面的代码。