使用Python中的链接刮擦手机型号

问题描述 投票:0回答:3

我试图废弃这个网站https://www.m1.com.sg/personal/mobile/phones/filters/all-plans/all/all/0/1500/0/0/none的手机型号列表

这将列出模型和价格。我有以下代码,但所有价格都不正确。它们不应该是零。我做错了什么?

此外,是否可以提供可点击的链接(允许用户点击“更多信息”,只需使用美丽的汤,就可以将他们带到带有手机型号附加信息的页面上)?例如:

 iPhone XR 128GB
   $ 0 
   More Info

import urllib.request
from bs4 import BeautifulSoup
from html.parser import HTMLParser

url_toscrape = "https://www.m1.com.sg/personal/mobile/phones/filters/all-plans/all/all/0/1500/0/0/none"
response = urllib.request.urlopen(url_toscrape)
info_type = response.info()
responseData = response.read()
soup = BeautifulSoup(responseData, 'lxml')

Model_findall=soup.findAll("div",{"class":"td three title text-center"})
price_findall=soup.findAll("div",{"class":"td two price text-center"})


for models in Model_findall:
    print('*',models.text.strip())
    print(' ',price.text.strip())

我检索到的内容:

* iPhone XR 128GB
  $ 0
* iPhone XR 256GB
  $ 0
* iPhone XR 64GB
  $ 0
* iPhone XS 256GB
  $ 0
* iPhone XS 512GB
  $ 0
* iPhone XS 64GB
  $ 0
* iPhone XS Max 256GB
  $ 0
* iPhone XS Max 512GB
  $ 0
* iPhone XS Max 64GB
  $ 0
* ASUS ZenFone 5Q
  $ 0
* ASUS ZenFone Live L1
  $ 0
* BlackBerry KEY2
  $ 0
* BlackBerry KEY2 LE
  $ 0
* BlackBerry KEYone Dual SIM
  $ 0
* Huawei Mate 20
  $ 0
* Huawei Mate 20 Pro
  $ 0
* Huawei Mate 20 X
  $ 0
* Huawei Nova 3i
  $ 0
* Huawei P20
  $ 0
* Huawei P20 Pro
  $ 0
* Huawei Y6 2018
  $ 0
* Huawei Y6 Pro 2019
  $ 0
* iPhone 7 (32GB)
  $ 0
* iPhone 7 Plus (32GB)
  $ 0
* Lenovo Tab 7 Essential (LTE)
  $ 0
* LG G7+ ThinQ
  $ 0
* LG V40 ThinQ
  $ 0
* OPPO AX7
  $ 0
* OPPO Find X (256GB)
  $ 0
* OPPO R17
  $ 0
* OPPO R17 Pro
  $ 0
* Samsung Galaxy A7
  $ 0
* Samsung Galaxy A9
  $ 0
* Samsung Galaxy J4+
  $ 0
* Samsung Galaxy J6+
  $ 0
* Samsung Galaxy J7 Duo
  $ 0
* Samsung Galaxy Note9 128GB
  $ 0
* Samsung Galaxy Note9 512GB
  $ 0
* Samsung Galaxy S10 128GB
  $ 0
* Samsung Galaxy S10+ 128GB
  $ 0
* Samsung Galaxy S10+ 1TB
  $ 0
* Samsung Galaxy S10+ 512GB
  $ 0
* Samsung Galaxy S10e 128GB
  $ 0
* Samsung Galaxy S9 64GB
  $ 0
* Samsung Galaxy Tab A (2018) 10.5"
  $ 0
* Samsung Galaxy Tab A 7.0
  $ 0
* Samsung Galaxy Tab S4 256GB
  $ 0
* Samsung Galaxy Tab S4 64GB
  $ 0
* vivo Nex Dual Screen Edition
  $ 0
* vivo V11
  $ 0
* vivo Y95
  $ 0
* Xiaomi Mi A2
  $ 0
* Xiaomi Redmi Note 6 Pro
  $ 0

非常感谢你!

python web-scraping beautifulsoup
3个回答
0
投票

以下脚本应该为您提供所需的输出。

import requests
from bs4 import BeautifulSoup

url = "https://www.m1.com.sg/personal/mobile/phones/filters/all-plans/all/all/0/1500/0/0/none"

response = requests.get(url)
soup = BeautifulSoup(response.text, 'lxml')
for items in soup.find_all(class_="phone-line"):
    model = items.find(class_="title").text.strip()
    price = items.find(class_="light-blue").text.strip()
    print(model,price)

0
投票

你的意思是这样的吗?

url_toscrape = "https://www.m1.com.sg/personal/mobile/phones/filters/all-plans/all/all/0/1500/0/0/none"
response = urllib.request.urlopen(url_toscrape)
info_type = response.info()
responseData = response.read()
soup = BeautifulSoup(responseData, 'lxml')

for tr in soup.find_all("div",{"class":"tr middle"}):
    for model in tr.find_all("div",{"class":"td three title text-center"}):
        model = model.text.strip()
    for price in tr.find_all("div",{"class":"td two price text-center"}):
        price = price.text.strip()
    for info in tr.find_all("div",{"class":"td two description"}):
        for link in info.find_all("a"):
            info = info.text.strip() + ": https://www.m1.com.sg" + link['href'].replace(" ","%20")
    print (model,price,info)

0
投票

您可以使用以下css类和id选择器

import requests
from bs4 import BeautifulSoup 
import pandas as pd

url = "https://www.m1.com.sg/personal/mobile/phones/filters/all-plans/all/all/0/1500/0/0/none"  
response = requests.get(url)
soup = BeautifulSoup(response.text, 'lxml')

models = [item.text for item in soup.select('#PhoneListDiv .color-orange')]
prices = [item.text for item in soup.select('.price .light-blue')]
df = pd.DataFrame(list(zip(models, prices)), columns = ['Model', 'Price'])
print(df)
© www.soinside.com 2019 - 2024. All rights reserved.