如何从终端运行 scrapy Crawl Spider?

问题描述 投票:0回答:1

我根据教程制作了代码,几乎是一样的。那家伙从终端运行它,并使用 .csv 文件作为输出,但是当我运行它时,它会出现很多运行该文件的选项,但不会生成我想要的任何内容,但显然不会抛出任何错误。我做错了什么?我这样运行:

(.venv) PS C:\Users\dodge\PycharmProjects\Scraper> scrapy runspider merclibre.py -o mercadolibre.csv -t csv

输出是这样的:

2024-12-25 19:47:05 [scrapy.utils.log] INFO: Scrapy 2.12.0 started (bot: scrapybot)
2024-12-25 19:47:05 [scrapy.utils.log] INFO: Versions: lxml 5.3.0.0, 
libxml2 2.11.7
, cssselect 1.2.0, parsel 1.9.1, w3lib 2.2.1, Twisted 24.11.0, Python 
3.12.6 (tags/
v3.12.6:a4a2d2b, Sep  6 2024, 20:11:23) [MSC v.1940 64 bit (AMD64)], 
pyOpenSSL 24.3
.0 (OpenSSL 3.4.0 22 Oct 2024), cryptography 44.0.0, Platform Windows- 
11-10.0.22631-SP0
Usage
=====
scrapy runspider [options] <spider_file>

Run the spider defined in the given file

Options
=======
-h, --help            show this help message and exit
-a NAME=VALUE         set spider argument (may be repeated)
-o FILE, --output FILE
                    append scraped items to the end of FILE (use - for
                    stdout), to define format set a colon at the end of 
the    
                    output URI (i.e. -o FILE:FORMAT)
-O FILE, --overwrite-output FILE
                    dump scraped items into FILE, overwriting any 
existing     
                    file, to define format set a colon at the end of 
the       
                    output URI (i.e. -O FILE:FORMAT)

Global Options
--------------
--logfile FILE        log file. if omitted stderr will be used
-L LEVEL, --loglevel LEVEL
                    log level (default: DEBUG)
--nolog               disable logging completely
--profile FILE        write python cProfile stats to FILE
--pidfile FILE        write process ID to FILE
-s NAME=VALUE, --set NAME=VALUE
                    set/override setting (may be repeated)
--pdb                 enable pdb on failure
python web-scraping scrapy
1个回答
0
投票
  1. 蜘蛛类可能没有正确定义
  2. start_urls 可能未设置
  3. 解析方法可能无法正确实现

一个典型的 Scrapy 蜘蛛结构应该是这样的:

import scrapy

class MercadoLibreSpider(scrapy.Spider):
    name = 'mercadolibre'  # This is important!
    start_urls = ['your_start_url_here']
    
    def parse(self, response):
        # Your parsing logic here
        pass

如果蜘蛛正确实现,您使用的命令应该可以工作:

scrapy runspider merclibre.py -o mercadolibre.csv -t csv

或者,您可以尝试运行它:

scrapy crawl mercadolibre -o mercadolibre.csv

但是,只有当你的蜘蛛是 Scrapy 项目的一部分(具有正确的项目结构)时,第二个命令才有效。

© www.soinside.com 2019 - 2024. All rights reserved.