如果产品名称太长,Python Selenium 无法获取产品名称

问题描述 投票:0回答:1

我正在尝试从 https://www.nguyenkim.com/tim-kiem.html?tu-khoa=may+tinh 抓取所有产品(名称、图片、价格和链接),但如果产品名称是太长,如“Máy tính để bàn HP 205 Pro G4 AIO R5-4500U/8GB/256GB/Win10 31Y21PA” - 产品卡无法显示全部,导致名称和价格变为空字符串,但图像、链接返回正确值。

如果一切正常则返回值:

Item(
    link = 'https://www.nguyenkim.com/chuot-logitech-m100r.html',
    name = 'Chuột máy tính Logitech M100R Đen',
    current_price = '109.000đ',
    place = 'Nguyen Kim',
    img = 'https://cdn.nguyenkimmall.com/images/thumbnails/210/210/detailed/177/10026584-chuot-logitech-m100r-den-1.jpg'
)

如果名称太长则返回值:

Item(
    link = 'https://www.nguyenkim.com/may-tinh-bang-xiaomi-redmi-pad-64gb-xam.html',
    name = '',
    current_price = '',
    place = 'Nguyen Kim',
    img = 'https://cdn.nguyenkimmall.com/images/thumbnails/210/210/detailed/847/10053972-may-tinh-bang-xiaomi-redmi-pad-64gb-xam.jpg'
)

我尝试过的:

    content = driver.find_element(By.CLASS_NAME, 'result-wrapper')
    items = content.find_elements(By.CLASS_NAME, 'product')
    for _ in items:
        item = Item(
          link = _.find_element(By.CSS_SELECTOR, "div[class*='product-header']").get_attribute('href'),
          name = _.find_element(By.CSS_SELECTOR, "div.product-title a").text,
          current_price = _.find_element(By.CSS_SELECTOR, "p[class*='final-price']").text,
          place = "Nguyen Kim",
          img = _.find_element(By.CSS_SELECTOR, "img").get_attribute('src')
        )
        print(item)

python html string selenium-webdriver css-selectors
1个回答
0
投票

您可以在下面找到一种获取这些完整项目名称的简单方法。为此,您不需要 Selenium 的开销:

import requests
from bs4 import BeautifulSoup as bs
import pandas as pd

pd.set_option('display.max_columns', None)
pd.set_option('display.max_colwidth', None)

headers = {
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.5112.79 Safari/537.36'
}

s = requests.Session()
s.headers.update(headers)

big_list = []
for i in range(1,3):
    r = s.get(f'https://www.nguyenkim.com/tim-kiem.html?tu-khoa=may+tinh&trang={i}')
    soup = bs(r.text, 'lxml')
    all_prods = soup.select('a[class="product-render preloadcard"]')
    for prod in all_prods:
        big_list.append((prod.get('name'), prod.get('link')))
df = pd.DataFrame(big_list, columns = ['product', 'url'])
print(df)

终端结果:

    product     url
0   Máy tính để bàn HP 205 Pro G4 AIO R5-4500U/8GB/256GB/Win10 31Y21PA  https://www.nguyenkim.com/may-tinh-de-ban-hp-205-pro-g4-aio-31y21pa.html
1   Máy tính bảng Samsung Galaxy Tab A7 Lite 32GB Bạc   https://www.nguyenkim.com/may-tinh-bang-samsung-galaxy-tab-a7-lite-32gb-bac.html
2   Màn hình máy tính Acer R241Y    https://www.nguyenkim.com/man-hinh-may-tinh-acer-r241y.html
3   Chuột máy tính Logitech M100R Đen   https://www.nguyenkim.com/chuot-logitech-m100r.html
4   Máy tính bảng Xiaomi Redmi Pad 64GB Xám     https://www.nguyenkim.com/may-tinh-bang-xiaomi-redmi-pad-64gb-xam.html
5   Máy tính bảng Samsung Galaxy Tab A8 64GB Bạc (2022)     https://www.nguyenkim.com/may-tinh-bang-samsung-galaxy-tab-a8-64gb-bac-2022.html
6   Máy tính bảng Xiaomi Redmi Pad 64GB Bạc     https://www.nguyenkim.com/may-tinh-bang-xiaomi-redmi-pad-64gb-bac.html
7   Màn hình máy tính Samsung 24 inch LS24AM506NEXXV    https://www.nguyenkim.com/man-hinh-may-tinh-samsung-24-inch-ls24am506nexxv.html
8   Chuột máy tính Apple Magic Mouse MK2E3ZA/A Bạc  https://www.nguyenkim.com/chuot-may-tinh-apple-magic-mouse-mk2e3za-a-bac.html
9   Máy tính bảng Samsung Galaxy Tab S7 FE 64GB Xanh    https://www.nguyenkim.com/may-tinh-bang-samsung-galaxy-tab-s7-fe-64gb-xanh.html
10  Máy tính bảng Samsung Galaxy Tab A8 64GB Xám (2022)     https://www.nguyenkim.com/may-tinh-bang-samsung-galaxy-tab-a8-64gb-xam-2022.html
11  Máy tính bảng Nokia T20 4GB/64GB Xanh đại dương     https://www.nguyenkim.com/may-tinh-bang-nokia-t20-4gb-64gb-xanh-dai-duong.html
12  Máy tính bảng Xiaomi Redmi Pad 64GB Xanh    https://www.nguyenkim.com/may-tinh-bang-xiaomi-redmi-pad-64gb-xanh.html
13  Máy tính bảng OPPO Pad Air 64GB Xám     https://www.nguyenkim.com/may-tinh-bang-oppo-pad-air-64gb-xam.html
14  Máy tính bảng Samsung Galaxy Tab A7 Lite 32GB Xám   https://www.nguyenkim.com/may-tinh-bang-samsung-galaxy-tab-a7-lite-32gb-xam.html
15  Loa máy tính Microlab B-77BT    https://www.nguyenkim.com/loa-vi-tinh-microlab-b-77bt.html
16  Chuột không dây Logitech M185 Đỏ    https://www.nguyenkim.com/chuot-khong-day-logitech-m185-do.html
17  CHUỘT KHÔNG DÂY ELECOM M-IR07DRWH   https://www.nguyenkim.com/chuot-khong-day-elecom-m-ir07dr.html?pid=61165
18  iPad Pro M1 2021 12.9 inch Wifi Cellular 128GB MHR53ZA/A Xám    https://www.nguyenkim.com/ipad-pro-m1-2021-wifi-cellular-128gb-mhr53za-a-xam.html
19  Chuột không dây Logitech B175   https://www.nguyenkim.com/chuot-khong-day-logitech-b175-den.html
20  CHUỘT KHÔNG DÂY ELECOM M-IR07DRRD   https://www.nguyenkim.com/chuot-khong-day-elecom-m-ir07drrd.html
21  Lõi lọc Sunhouse số 1   https://www.nguyenkim.com/loi-loc-sunhouse-so-1.html
22  Chuột không dây Logitech M185 Xanh Dương    https://www.nguyenkim.com/chuot-khong-day-logitech-m185-xanh-duong.html
23  CHUỘT KHÔNG DÂY ELECOM M-IR07DRBU   https://www.nguyenkim.com/chuot-khong-day-elecom-m-ir07dr.html
24  Tai nghe Microlab K360  https://www.nguyenkim.com/tai-nghe-microlab-k360.html
25  Tai nghe vi tính Logitech H150 Xanh     https://www.nguyenkim.com/tai-nghe-logitech-h150-xanh.html
26  Tai nghe vi tính Soundmax AH-304    https://www.nguyenkim.com/tai-nghe-vi-tinh-soundmax-ah-304.html
27  Tai nghe vi tính Logitech H111  https://www.nguyenkim.com/tai-nghe-nhac-logitech-h111.html
28  CHUỘT ELECOM BLUELED M-BL16UBWH     https://www.nguyenkim.com/chuot-elecom-blueled-m-bl16ubwh.html
29  iPad Pro M1 2021 11 inch Wifi 512GB MHQX3ZA/A Bạc   https://www.nguyenkim.com/ipad-pro-m1-2021-wifi-512gb-mhqx3za-a-bac.html
30  Chuột không dây Logitech M185 Xám   https://www.nguyenkim.com/chuot-khong-day-logitech-m185-xam.html
31  CHUỘT KHÔNG DÂY ELECOM M-IR07DRBK   https://www.nguyenkim.com/chuot-khong-day-elecom-m-ir07drbk.html

BeautifulSoup 文档可以在这里找到,Requests 文档可以在这里找到。

© www.soinside.com 2019 - 2024. All rights reserved.