如何抓取所有客户评论?

问题描述 投票:0回答:1

我正在尝试抓取此网站中的所有评论 - https://www.backmarket.com/en-us/r/l/airpods/345c3c05-8a7b-4d4d-ac21-518b12a0ec17。该网站说有 753 条评论,但当我尝试抓取所有评论时,我只得到 10 条评论。所以,我不知道如何从页面上抓取所有 753 条评论,这是我的代码-

# importing modules 
import pandas as pd
from requests import get
from bs4 import BeautifulSoup

# Fetch the web page
url = 'https://www.backmarket.com/en-us/r/l/airpods/345c3c05-8a7b-4d4d-ac21-518b12a0ec17'
response = get(url) # link exlcudes posts with no picures
page = response.text

# Parse the HTML content
soup = BeautifulSoup(page, 'html.parser')

# To see different information
## reviewer's name 

reviewers_name = soup.find_all('p', class_='body-1-bold') 
[x.text for x in reviewers_name]

name = []

for items in reviewers_name:
    name.append(items.text if items else None)

## Purchase Data 

purchase_date = soup.find_all('p', class_='text-static-default-low body-2')
[x.text for x in purchase_date]

date = []
for items in purchase_date:
    date.append(items.text if items else None)


## Country 

country_text = soup.find_all('p', class_='text-static-default-low body-2 mt-32')
[x.text for x in country_text]

country = []

for items in country_text:
    country.append(items.text if items else None)


## Reviewed Products 

products_text = soup.find_all('span', class_= 'rounded-xs inline-block max-w-full truncate body-2-bold px-4 py-0 bg-static-default-mid text-static-default-hi')
[x.text for x in products_text]

products = []

for items in products_text:
    products.append(items.text if items else None)

## Actual Reviews 

review_text = soup.find_all('p',class_='body-1 block whitespace-pre-line')
[x.text for x in review_text]

review = []

for items in review_text:
    review.append(items.text if items else None)


## Review Ratings 

review_ratings_value = soup.find_all('span',class_='ml-4 mt-1 md:mt-2 body-2-bold')
[x.text for x in review_ratings_value]

review_ratings = []

for items in review_ratings_value:
    review_ratings.append(items.text if items else None)



# Create the Data Frame 
pd.DataFrame({
    'reviewers_name': name,
    'purchase_date': date,
    'country': country,
    'products': products,
    'review': review,
    'review_ratings': review_ratings
})

我的问题是如何抓取所有评论。

python beautifulsoup
1个回答
0
投票

您可以尝试(注意:该网站有“请求过多”保护,因此当您收到 HTTP 状态代码 429 时,您必须等待一段时间才能继续):

import time
import requests


url = "https://www.backmarket.com/reviews/product-landings/345c3c05-8a7b-4d4d-ac21-518b12a0ec17/products/reviews"

n, current_url = 1, url
while True:
    response = requests.get(current_url)

    # too many requests?
    if response.status_code == 429:
        print("Too many requests...")
        time.sleep(2)
        continue

    data = response.json()

    for r in data.get("results", []):
        print(n, "-" * 80)
        print(r["comment"])
        n += 1

    next_cursor = data.get("nextCursor")
    if not next_cursor:
        break

    current_url = f"{url}?cursor={next_cursor}"

打印:


...
748 --------------------------------------------------------------------------------
Ordered space gray but received silver
749 --------------------------------------------------------------------------------
Lately, every item we’ve ordered has had to be returned. I’m pretty bummed.
750 --------------------------------------------------------------------------------
They broke.
751 --------------------------------------------------------------------------------
They’re weren’t noise cancelling like it says in the description so it wasn’t what I wanted but other than that they are great.
© www.soinside.com 2019 - 2024. All rights reserved.