基本上,我正在尝试抓取一个网站,但我没有得到任何返回值作为响应。打印了response.text,但它没有提供动态数据。只有 . 中的非动态内容。已打印回复,但我只是得到
import requests
from bs4 import BeautifulSoup
# Set up the URL
url = "https://www.amazon.jobs/en/search?base_query=&loc_query="
# Send a GET request to the URL
response = requests.get(url)
# Check if the request was successful (status code 200)
if response.status_code == 200:
# Parse the HTML content using BeautifulSoup
soup = BeautifulSoup(response.content, 'html.parser')
# Find all job titles
job_titles = soup.find_all('h3', class_='job-title')
# Print the job titles
for title in job_titles:
print(title.text.strip())
else:
print("Failed to retrieve the page. Status code:", response.status_code)
我尝试了漂亮的汤、scrapy 和简单的请求,但没有任何效果。
我们可以从后台请求中提取数据。
import requests
import json
# Set up the URL
url = "https://www.amazon.jobs/en/search.json?radius=24km&facets%5B%5D=normalized_country_code&facets%5B%5D=normalized_state_name&facets%5B%5D=normalized_city_name&facets%5B%5D=location&facets%5B%5D=business_category&facets%5B%5D=category&facets%5B%5D=schedule_type_id&facets%5B%5D=employee_class&facets%5B%5D=normalized_location&facets%5B%5D=job_function_id&facets%5B%5D=is_manager&facets%5B%5D=is_intern&offset=0&result_limit=100&sort=relevant&latitude=&longitude=&loc_group_id=&loc_query=&base_query=&city=&country=®ion=&county=&query_options=&"
# Send a GET request to the URL
response = requests.get(url)
# Check if the request was successful (status code 200)
if response.status_code == 200:
# Parse the JSON content
data = response.json()
# Find all job titles
jobs_list = data['jobs']
# Print the job titles
for job in jobs_list:
print(job['title'].strip())
else:
print("Failed to retrieve the page. Status code:", response.status_code)