如何使用 BS4 抓取这些数据?我使用了
html.parser
但没有成功。
我的代码是:
for page in pages:
page= cat[1] + "?s=%3Arelevance&page=" + str(page)
page1 = requests.get(page)
soup = BeautifulSoup(page1.content, "html.parser")
data = [(re.find_all('div', attrs={'class':'prd'}), page1.text)]
if not data:
break
主数据:
<div id="product-item" class="prd " data-product-id="125034323" data-product-name="Lenovo IdeaPad 5 Intel Core i3-1115G4 4 GB 256 GB SSD Integrated Intel UHD Graphics 14" FHD W11 Platinum Notebook Gri 82FE00LBTX" data-product-category="Bilgisayar ve Tablet" data-product-brand="Lenovo" data-product-price="5799.0" data-product-url="/lenovo-ideapad-5-intel-core-i31115g4-4-gb-256-gb-ssd-integrated-intel-uhd-graphics-14-fhd-w11-platinum-notebook-gri-82fe00lbtx-p-125034323" data-product-page-type="CATEGORY" data-product-position="1" data-product-subcategory="Laptop, Notebook" data-product-actual-price="0.0" data-product-discounted-price="5799.0" data-product-rating-score="" data-product-review-count="" data-product-occasion="N" data-product-photo-count="8" data-product-video="N" data-product-special="N" data-product-stock="Y" data-product-stock-status="Satışta" data-product-review="" data-product-variant="" data-category-name="Bilgisayar ve Tablet" data-facet-name="" data-facet-value="">
您可以根据需要调整代码
import requests
from bs4 import BeautifulSoup
response = requests.get("https://www.teknosa.com/bilgisayar-tablet-c-116")
soup = BeautifulSoup(response.text, "html.parser")
data = []
for prd in soup.find_all('div', attrs={'class': 'prd'}): # or soup.select(".prd")
id = prd['data-product-id']
photo_count = prd['data-product-photo-count']
name = prd['data-product-name']
discounted_price = prd['data-product-discounted-price']
url = prd['data-product-url']
data.append([id, photo_count, name, discounted_price, url])