尝试在 Python 2.7 上从 Aliexpress 抓取产品

问题描述 投票:0回答:1

我在互联网上找到了以下 python 代码,但我不确定如何使其工作。作者表示“在终端中导航到脚本路径并输入:

python aliexpresscrape.py Then type out the path to your file Path to File:/path/to/your/url/file"

我对在哪里输入速卖通产品 URL 感到有些困惑。

我从终端返回的消息之一是

Traceback (most recent call last):
  File "aliexpresscrape.py", line 70, in <module>
    read(selection)
  File "aliexpresscrape.py", line 64, in read
    with open(selection) as f:
IOError: [Errno 2] No such file or directory: '/Users/MyMacbookAir/Desktop/Songs\\  '

这是代码:

from lxml import html
import lxml.html
import requests
import csv
from csv import writer
#variables
selection = raw_input("Path to File: ")
csv_header = ("post_title","post_name","ID","post_excerpt","post_content","post_status","menu_order","post_date","post_parent","post_author","comment_status","sku","downloadable","virtual","visibility","stock","stock_status","backorders","manage_stock","regular_price","sale_price","weight","length","width","height","tax_status","tax_class","upsell_ids","crosssell_ids","featured","sale_price_dates_from","sale_price_dates_to","download_limit","download_expiry","product_url","button_text","meta:_yoast_wpseo_focuskw","meta:_yoast_wpseo_title","meta:_yoast_wpseo_metadesc","meta:_yoast_wpseo_metakeywords","images","downloadable_files","tax:product_type","tax:product_cat","tax:product_tag","tax:product_shipping_class","meta:total_sales","attribute:pa_color","attribute_data:pa_color","attribute_default:pa_color","attribute:size","attribute_data:size","attribute_default:size")

#write header to output file (runs once)
with open('output.csv', 'w') as f:
        writer=csv.writer(f)
        writer.writerow(csv_header)

def scrape(url):
    page = requests.get(url)
    tree = html.fromstring(page.content)
    title2 = str(lxml.html.parse(url).find(".//title").text)
    title2 = title2.replace('-' + title2.split("-", 1)[1], '')
    price = tree.xpath("//span[@itemprop='price']//text()")
    i = 0
    for span in tree.cssselect('span'):
        clas = span.get('class')
        rel = span.get('rel')
        if clas == "packaging-des":
            if rel != None:
                if i == 0:
                    weight = rel
                elif i == 1:
                    dim = str(rel)
                i = i+1

    weight = weight
    height = dim.split("|", 3)[0]
    length = dim.split("|", 3)[1]
    width = dim.split("|", 3)[2]
    #Sometimes aliexpress doesn't list a price
    #This dumps a 0 into price in that case to stop the errors
    if len(price) == 1:
        price = float(str(price[0]))
    elif len(price) == 0:
        price = int(0)
    for inpu in tree.cssselect('input'):
        if inpu.get("id") == "hid-product-id":
            sku = inpu.get('value')
    for meta in tree.cssselect('meta'):
        name = meta.get("name")
        prop = meta.get("property")
        content = meta.get('content')
        if prop == 'og:image':
            image = meta.get('content')
        if name == 'keywords':
             keywords = meta.get('content')
        if name == 'description':
            desc = meta.get('content')
    listvar = ([str(title2),str(name), '', '', str(desc), 'publish', '', '', '0', '1', 'open', str(sku), 'no', 'no', 'visible', '', 'instock', 'no', 'no', str(price*2),str(price*1.5), str(weight), str(length), str(width), str(height), 'taxable', '', '', '', 'no', '', '', '', '', '', '', '', '', '', str(keywords), str(image), '', 'simple', '', '', '', '0', '', '', '', '', '', '', '', ''])
    with open("output.csv",'ab') as f:
        writer=csv.writer(f)
        writer.writerow(listvar)

def read(selection):
    lines = []
    j = 0
    with open(selection) as f:
        for line in f:
            lines.append(line)
        lines = map(lambda s: s.strip(), lines)    
    for j in range(len(lines)):
        scrape(str(lines[j]))
read(selection)
python web-scraping
1个回答
0
投票

无需查看脚本文档(您可以提供链接吗?),看起来您应该创建一个包含您想要转义的页面的所有 url 的文件。例如,假设我创建一个如下所示的文件:

https://www.aliexpress.com/category/100003070/men-clothing-accessories.html?spm=2114.11010108.102.1.jGW0U0
https://www.aliexpress.com/category/100003109/women-clothing-accessories.html?spm=2114.11010108.101.1.jGW0U0
https://www.aliexpress.com/category/509/phones-telecommunications.html?spm=2114.11010108.103.1.jGW0U0

并将其保存到

C:\ali.txt

然后开始抓取这三个链接,我在终端中输入:

python aliexpresscrape.py

然后程序要做的第一件事就是询问文件名:

selection = raw_input("Path to File: ")

现在您可以输入文件的路径

C:\ali.txt

然后按回车键。

然后,对于脚本在文本文件中找到的每一行,它将使用行内容(这是一个 url)调用

scrape
并开始对该 url 进行缩放。

© www.soinside.com 2019 - 2024. All rights reserved.