我正在尝试抓取以下网址:https://papemelroti.com/products/live-free-badge
但是好像找不到这个表类
<table class="hulkapps-table table"><thead><tr><th style="border-top-left-radius: 0px;">Quantity</th><th style="border-top-right-radius: 0px;">Bulk Discount</th><th style="display: none">Add to Cart</th></tr></thead><tbody><tr><td style="border-bottom-left-radius: 0px;">Buy 50 + <span class="hulk-offer-text"></span></td><td style="border-bottom-right-radius: 0px;"><span class="hulkapps-price"><span class="money"><span class="money"> ₱1.00 </span></span> Off</span></td><td style="display: none;"><button type="button" class="AddToCart_0" style="cursor: pointer; font-weight: 600; letter-spacing: .08em; font-size: 11px; padding: 5px 15px; border-color: #171515; border-width: 2px; color: #ffffff; background: #161212;" onclick="add_to_cart(50)">Add to Cart</button></td></tr></tbody></table>
我已经有了我的 Selenium 代码,但它仍然没有抓取它。这是我的代码:
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from bs4 import BeautifulSoup
import time
# Set up Chrome options
chrome_options = Options()
chrome_options.add_argument("--headless")
chrome_options.add_argument("--no-sandbox")
chrome_options.add_argument("--disable-dev-shm-usage")
service = Service('/usr/local/bin/chromedriver') # Adjust path if necessary
driver = webdriver.Chrome(service=service, options=chrome_options)
def get_page_html(url):
driver.get(url)
time.sleep(3) # Wait for JS to load
return driver.page_source
def scrape_discount_quantity(url):
page_html = get_page_html(url)
soup = BeautifulSoup(page_html, "html.parser")
# Locate the table containing the quantity and discount
table = soup.find('table', class_='hulkapps-table')
print(page_html)
if table:
table_rows = table.find_all('tr')
for row in table_rows:
quantity_cells = row.find_all('td')
if len(quantity_cells) >= 2: # Check if there are at least two cells
quantity_cell = quantity_cells[0].get_text(strip=True) # Get quantity text
discount_cell = quantity_cells[1].get_text(strip=True) # Get discount text
return quantity_cell, discount_cell
return None, None
# Example usage
url = 'https://papemelroti.com/products/live-free-badge'
quantity, discount = scrape_discount_quantity(url)
print(f"Quantity: {quantity}, Discount: {discount}")
driver.quit() # Close the browser when done
它不断返回“无”
折扣数据从此
https://volumediscount.hulkapps.com/api/v2/shop/get_offer_table
API 端点加载,当您使用 selenium driver.page_source
返回页面源时,bs4 没有要抓取的表名称,我尝试了您的代码并确认 hulkapps-table
不存在于回应!所以很明显的反应是 None
,
我使用了这个
https://volumediscount.hulkapps.com/api/v2/shop/get_offer_table
API 端点以及此请求中的 product_id
https://papemelroti.com/products/live-free-badge.json
,这是我的代码,它是基本的:
import requests
import json
def getDiscount(root_url):
prod_resp = requests.get(f'{root_url}.json').content #Get product_id
prod_id = json.loads(prod_resp)['product']['id']
disc_url = 'https://volumediscount.hulkapps.com/api/v2/shop/get_offer_table' #Discount URL
data = f'pid={prod_id}&store_id=papemelroti.myshopify.com'
headers = {
"User-Agent":"Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:130.0) Gecko/20100101 Firefox/130.0",
"Content-Type":"application/x-www-form-urlencoded; charset=UTF-8"
}
resp = requests.post(disc_url, data=data, headers=headers).content
data_json = json.loads(resp)
disc_json = json.loads(data_json['eligible_offer']['offer_levels'])[0]
#Offer has two variants: 'Price' and 'Off' so you can use condition if you like to scrape products other than 'live-free-badge'
if 'price_discount' in disc_json[2]:
print(f"Product ID:{prod_id} (Quantity: {disc_json[0]}, Discount: {disc_json[1]} Price discount)")
elif 'Off' in disc_json[2]:
print(f"Product ID:{prod_id} (Quantity: {disc_json[0]}, Discount: {disc_json[1]}% Off)")
#sample for both 'Off' and 'Price'
getDiscount('https://papemelroti.com/products/dear-me-magnet')
getDiscount('https://papemelroti.com/products/live-free-badge')
Product ID:7217967726790 (Quantity: 50, Discount: 10% Off)
Product ID:104213217289 (Quantity: 50, Discount: 1.00 Price discount)
让我知道这是否可以或者您是否想严格使用硒