如何使用Python抓取所有评论产品

问题描述 投票:0回答:3

现在我正在这个网站上做刮评产品 https://www.lazada.com.my/products/xiaomi-mi-a1-4gb-ram-32gb-rom-i253761547-s336359472.html?spm=a2o4k.searchlistcategory.list.64.71546883QBZiNT&search=1

我设法只在第一页获得评论

import pandas as pd
from urllib.request import Request, urlopen as uReq #package web scraping
from bs4 import BeautifulSoup as soup

def make_soup(website) :
req =  Request(website,headers = {'User-Agent' : 'Mozilla/5.0'})
uClient = uReq(req)
page_html = uClient.read()
uClient.close()
page_soup = soup(page_html, 'html.parser')
return page_soup
lazada_url = 'https://www.lazada.com.my/products/xiaomi-mi-a1-4gb-ram-32gb-rom-i253761547-s336359472.html?spm=a2o4k.searchlistcategory.list.64.71546883QBZiNT&search=1'

website = make_soup(lazada_url)
news_headlines = pd.DataFrame( columns = ['reviews','sentiment','score'])
headlines = website.findAll('div',attrs={"class":"item-content"})
n = 0
for item in headlines :
    top = item.div
    #print(top)
    #print()
    text_headlines = top.text
    print(text_headlines)
    print()
    n +=1
    news_headlines.loc[n-1,'title'] = text_headlines

结果仅第一页。如何处理所有页面。网址中没有页面可供我循环.. 你们可以检查网址.. 谢谢:)

I like this phone very much and it's global version. I recommend this phone for who like gaming. Delivery just took 3 days only. Thanks Lazada

Item was received in just two days and was wonderfully wrapped. Thanks for the excellent services Lazada!

Very happy with the phone. It's original, it arrived in good condition. Built quality is superb for a budget phone.

The delivery is very fast just take one day to reach at my home. However, the tax invoice is not attached. How do I get the tax invoice?

great deal from lazada. anyway, i do not find any tax invoice. please do email me the tax invoice. thank you.
python selenium web-scraping beautifulsoup
3个回答
2
投票

您可以抓取评论底部的分页来查找评论的最小和最大数量:

import requests
from bs4 import BeautifulSoup as soup

def get_page_reviews(content:soup) -> dict:
  rs = content.find('div', {'class':'mod-reviews'}).find_all('div', {'class':'item'})
  reviews = [i.find('div', {'class':'item-content'}).find('div', {'class':'content'}).text for i in rs]
  stars = [len(c.find('div', {'class':'top'}).find_all('img')) for c in rs]
  _by = [i.find('div', {'class':'middle'}).find('span').text for i in rs]
  return {'stars':stars, 'reviews':reviews, 'authors':_by}

d = soup(requests.get('https://www.lazada.com.my/products/xiaomi-mi-a1-4gb-ram-32gb-rom-i253761547-s336359472.html?spm=a2o4k.searchlistcategory.list.64.71546883QBZiNT&search=1').text, 'html.parser')
results = list(map(int, filter(None, [i.text for i in d.find_all('button', {'class':'next-pagination-item'})])))
for i in range(min(results), max(results)+1):
  new_url = f'https://www.lazada.com.my/products/xiaomi-mi-a1-4gb-ram-32gb-rom-i253761547-s336359472.html?spm=a2o4k.searchlistcategory.list.64.71546883QBZiNT&search={i}'
  #now, can use new_url to request the next page of reviews
  r = get_page_reviews(soup(requests.get(new_url).text, 'html.parser'))
  final_result = [{'stars':a, 'author':b, 'review':c} for a, b, c in zip(r['stars'], r['authors'], r['reviews'])]

输出(第一页):

[{'stars': 5, 'author': 'by Ridwan R.', 'review': "I like this phone very much and it's global version. I recommend this phone for who like gaming. Delivery just took 3 days only. Thanks Lazada"}, {'stars': 5, 'author': 'by Razli A.', 'review': 'Item was received in just two days and was wonderfully wrapped. Thanks for the excellent services Lazada!'}, {'stars': 5, 'author': 'by Nur F.', 'review': "Very happy with the phone. It's original, it arrived in good condition. Built quality is superb for a budget phone."}, {'stars': 5, 'author': 'by Muhammad S.', 'review': 'The delivery is very fast just take one day to reach at my home. However, the tax invoice is not attached. How do I get the tax invoice?'}, {'stars': 5, 'author': 'by Xavier Y.', 'review': 'great deal from lazada. anyway, i do not find any tax invoice. please do email me the tax invoice. thank you.'}]

2
投票

你需要做的就是使用

click()
中的
Selenium
方法。

Selenium
是一个用于 Web 应用程序的便携式软件测试框架,可让您访问 Web 并获取所需的资源。

在给定的 URL 中,有用于审阅的页面按钮,因此只需使用

xpath
找到
class
id
find_element_by_(anything you want).click()
按钮即可。这将引导您进入下一页。

这是我的示例代码:D

from selenium import webdriver
from bs4 import BeautifulSoup as soup
import time
from selenium.webdriver.chrome.options import Options


url = 'https://www.lazada.com.my/products/xiaomi-mi-a1-4gb-ram-32gb- rom-i253761547-s336359472.html? spm=a2o4k.searchlistcategory.list.64.71546883QBZiNT&search=1'

chrome_options = Options()
#chrome_options.add_argument("--headless")
browser = webdriver.Chrome('/Users/baejihwan/Documents/chromedriver', 
chrome_options=chrome_options)
browser.get(url)
time.sleep(0.1)

page_soup = soup(browser.page_source, 'html.parser')
headlines = page_soup.findAll('div',attrs={"class":"item-content"})

for item in headlines :
    top = item.div
    text_headlines = top.text
    print(text_headlines)

browser.find_element_by_xpath('//* .[@id="module_product_review"]/div/div[3]/div[2]/div/div/button[2]').click()

page_soups = soup(browser.page_source, 'html.parser')
headline = page_soups.findAll('div',attrs={"class":"item-content"})

for item in headline:
    top = item.div
    text_headlines = top.text
    print(text_headlines)

输出

I like this phone very much and it's global version. I recommend this phone for who like gaming. Delivery just took 3 days only. Thanks Lazada

Item was received in just two days and was wonderfully wrapped. Thanks for the excellent services Lazada!

Very happy with the phone. It's original, it arrived in good condition. Built quality is superb for a budget phone.

The delivery is very fast just take one day to reach at my home. However, the tax invoice is not attached. How do I get the tax invoice?

great deal from lazada. anyway, i do not find any tax invoice. please do email me the tax invoice. thank you.

Penghantaran cepat. Order ahad malam, sampai rabu pagi. Tu pun sbb selasa cuti umum. 
Fon disealed dgn bubble wrap dan box.
Dah check mmg original malaysia.
Dpt free tempered glass. Ok je.
Fon so far pakai ok.
Selama ni pakai iphone, bila pakai android ni kekok sikit. 
invoice tidak disertakan.
battery dia dikira cpt juga hbs.. 

Saya telah beli smartphone xioami mi a1 dan telah terima hari ni. Tetapi telefon itu telah rosak. Tidak dapat on.

beli pada 1/6 dgn harga rm599 dpt free gift usb otg type c 64gb jenama sandisk.
delivery pantas, order 1/6 sampai 4/6 tu pon sebab weekend ja kalau x mesti order harini esk sampai dah.
packaging terbaik, dalam kotak ada air bag so memang secure.
kotak fon sealed, dlm kotak dapat screen protector biasa free, kabel type c dgn charger 3 pin.
keluar kotak terus update ke Android oreo, memang puas hati la overall. memang berbaloi sangat beli. Kudos to lazada.

i submitted the order on on sunday and i get it tuesday morning, even the despatch guy called me at 830am just to make sure if im already at the office. super reliable. for the phone, well i got it for RM599. what could you possibly asked for more? hehehe

Purchased Xiaomi Mi A1 from Official store with an offer of "Free gift SanDisk Ultra 64GB Dual USB Drive 3.0 OTG Type C Flash Drive". But they delivered only USB drive 2.0 

我以极其天真的方式尝试过!最好定义一个读取html代码并解析你想要的数据的函数。这段代码只解析到第2页的评论,你可以修改它以获取所有评论到最后! :D 如果您对此代码有疑问,请发表评论!

希望这有帮助!


-2
投票

要使用 Python 抓取来自 Lazada 等网站的所有产品评论,首先要在无头模式下使用 Chrome 驱动程序设置 Selenium 以提高效率。加载产品页面并使用 Beautiful Soup 从当前页面提取评论。要处理分页,请检查“下一页”按钮以导航到下一页,每次收集评论。继续此过程,直到没有更多页面可用。这种方法可确保您有效地捕获所有页面上的评论。

© www.soinside.com 2019 - 2024. All rights reserved.