如何从终端运行 scrapy Crawl Spider？

Question

我根据教程制作了代码，几乎是一样的。那家伙从终端运行它，并使用 .csv 文件作为输出，但是当我运行它时，它会出现很多运行该文件的选项，但不会生成我想要的任何内容，但显然不会抛出任何错误。我做错了什么？我这样运行：

(.venv) PS C:\Users\dodge\PycharmProjects\Scraper> scrapy runspider merclibre.py -o mercadolibre.csv -t csv

输出是这样的：

2024-12-25 19:47:05 [scrapy.utils.log] INFO: Scrapy 2.12.0 started (bot: scrapybot)
2024-12-25 19:47:05 [scrapy.utils.log] INFO: Versions: lxml 5.3.0.0, 
libxml2 2.11.7
, cssselect 1.2.0, parsel 1.9.1, w3lib 2.2.1, Twisted 24.11.0, Python 
3.12.6 (tags/
v3.12.6:a4a2d2b, Sep  6 2024, 20:11:23) [MSC v.1940 64 bit (AMD64)], 
pyOpenSSL 24.3
.0 (OpenSSL 3.4.0 22 Oct 2024), cryptography 44.0.0, Platform Windows- 
11-10.0.22631-SP0
Usage
=====
scrapy runspider [options] <spider_file>

Run the spider defined in the given file

Options
=======
-h, --help            show this help message and exit
-a NAME=VALUE         set spider argument (may be repeated)
-o FILE, --output FILE
                    append scraped items to the end of FILE (use - for
                    stdout), to define format set a colon at the end of 
the    
                    output URI (i.e. -o FILE:FORMAT)
-O FILE, --overwrite-output FILE
                    dump scraped items into FILE, overwriting any 
existing     
                    file, to define format set a colon at the end of 
the       
                    output URI (i.e. -O FILE:FORMAT)

Global Options
--------------
--logfile FILE        log file. if omitted stderr will be used
-L LEVEL, --loglevel LEVEL
                    log level (default: DEBUG)
--nolog               disable logging completely
--profile FILE        write python cProfile stats to FILE
--pidfile FILE        write process ID to FILE
-s NAME=VALUE, --set NAME=VALUE
                    set/override setting (may be repeated)
--pdb                 enable pdb on failure

Answer 1

蜘蛛类可能没有正确定义
start_urls 可能未设置
解析方法可能无法正确实现

一个典型的 Scrapy 蜘蛛结构应该是这样的：

import scrapy

class MercadoLibreSpider(scrapy.Spider):
    name = 'mercadolibre'  # This is important!
    start_urls = ['your_start_url_here']
    
    def parse(self, response):
        # Your parsing logic here
        pass

如果蜘蛛正确实现，您使用的命令应该可以工作：

scrapy runspider merclibre.py -o mercadolibre.csv -t csv

或者，您可以尝试运行它：

scrapy crawl mercadolibre -o mercadolibre.csv

但是，只有当你的蜘蛛是 Scrapy 项目的一部分（具有正确的项目结构）时，第二个命令才有效。

如何从终端运行 scrapy Crawl Spider？

问题描述投票：0回答：1

1个回答

最新问题

如何从终端运行 scrapy Crawl Spider？

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1