我根据教程制作了代码,几乎是一样的。那家伙从终端运行它,并使用 .csv 文件作为输出,但是当我运行它时,它会出现很多运行该文件的选项,但不会生成我想要的任何内容,但显然不会抛出任何错误。我做错了什么?我这样运行:
(.venv) PS C:\Users\dodge\PycharmProjects\Scraper> scrapy runspider merclibre.py -o mercadolibre.csv -t csv
输出是这样的:
2024-12-25 19:47:05 [scrapy.utils.log] INFO: Scrapy 2.12.0 started (bot: scrapybot)
2024-12-25 19:47:05 [scrapy.utils.log] INFO: Versions: lxml 5.3.0.0,
libxml2 2.11.7
, cssselect 1.2.0, parsel 1.9.1, w3lib 2.2.1, Twisted 24.11.0, Python
3.12.6 (tags/
v3.12.6:a4a2d2b, Sep 6 2024, 20:11:23) [MSC v.1940 64 bit (AMD64)],
pyOpenSSL 24.3
.0 (OpenSSL 3.4.0 22 Oct 2024), cryptography 44.0.0, Platform Windows-
11-10.0.22631-SP0
Usage
=====
scrapy runspider [options] <spider_file>
Run the spider defined in the given file
Options
=======
-h, --help show this help message and exit
-a NAME=VALUE set spider argument (may be repeated)
-o FILE, --output FILE
append scraped items to the end of FILE (use - for
stdout), to define format set a colon at the end of
the
output URI (i.e. -o FILE:FORMAT)
-O FILE, --overwrite-output FILE
dump scraped items into FILE, overwriting any
existing
file, to define format set a colon at the end of
the
output URI (i.e. -O FILE:FORMAT)
Global Options
--------------
--logfile FILE log file. if omitted stderr will be used
-L LEVEL, --loglevel LEVEL
log level (default: DEBUG)
--nolog disable logging completely
--profile FILE write python cProfile stats to FILE
--pidfile FILE write process ID to FILE
-s NAME=VALUE, --set NAME=VALUE
set/override setting (may be repeated)
--pdb enable pdb on failure
一个典型的 Scrapy 蜘蛛结构应该是这样的:
import scrapy
class MercadoLibreSpider(scrapy.Spider):
name = 'mercadolibre' # This is important!
start_urls = ['your_start_url_here']
def parse(self, response):
# Your parsing logic here
pass
如果蜘蛛正确实现,您使用的命令应该可以工作:
scrapy runspider merclibre.py -o mercadolibre.csv -t csv
或者,您可以尝试运行它:
scrapy crawl mercadolibre -o mercadolibre.csv
但是,只有当你的蜘蛛是 Scrapy 项目的一部分(具有正确的项目结构)时,第二个命令才有效。